Emily Schoof

Los Angeles, CA | Emily.Schoof@gmail.com

ETL & Data Engineering | ML & Statistical Modeling | Data Strategy & Analytics

Highly accomplished professional with extensive experience building scalable ETL pipelines, machine learning models, and statistical solutions across healthcare, global health, and enterprise settings. Proven ability to translate complex data into actionable insights that drive business and operational outcomes. Expert in Python, PySpark, SQL, and cloud platforms, with hands-on experience in creating production-ready ML pipelines and predictive analytics. Adept at working closely with cross-functional teams to optimize data workSlows, improve decision-making, and deliver measurable impact. Focused on delivering advanced analytics and engineering solutions for dynamic organizations.

Skills

Technical: Programming Languages (Python, PySpark, SQL, T-SQL) | Databricks | Azure Data Factory | Azure Synapse AWS | AirFlow | Amazon Redshift | Microsoft SQL Server | PostgreSQL | Microsoft OfSice Suite (Word, Excel, PowerPoint)

Languages: English | Portuguese

Expertise: Machine Learning | Data Engineering | Data Science | Analytics Engineering | ETL/ELT Pipelines | Data Modeling | Data Architecture | Data Warehousing | Distributed Systems | Data Pipelines | Data Processing | Data Infrastructure | Data Analytics | Data Exploration | Data Visualization | KPI Development | A/B Testing | Experimentation Forecasting | Statistical Analysis | Feature Engineering | Model Deployment | Performance Optimization | Automation Cross-Functional Collaboration | Stakeholder Engagement | Agile | Technical Documentation | Co-Authoring Literature

Professional Experience

Embracing Augmentation LLC | CA, USA February 2022 - April 2026

Data and Strategy Consultant

- Provide strategic operational guidance and data-driven insights to improve performance and streamline decision-making for visionary, founder-led companies with an established legacy of success.
- Drive executive decision-making and operational performance by analyzing performance metrics and providing data-informed recommendations for delegation and resource planning.
- Improved performance by designing adaptable structures that align team capacity with business goals.
- Optimized workforce productivity by assessing leadership gaps and executing focused development strategies.
- Streamlined operations and empower executives by creating interventions and dashboards that track impact.

.

Pediatrix Medical Group | FL, USA September 2022 – January 2025

Senior Data Engineer

- Spearheaded end-to-end data migration and optimization initiatives, modernizing ETL pipelines and platform infrastructure to improve processing efSiciency and support enterprise-scale health care analytics.
- Reduced downtime and ensured data continuity by executing ISDBI migration to a new informatics platform.
- Accelerated migration from Databricks to Synapse by constructing ETL pipelines using Python and PySpark.
- Cut ETL pipeline latency by 40% by improving CPU usage and processing capacity for healthcare analytics.
- Worked with teams to integrate migrated data pipelines with clinical reporting and operational systems.
- Supported advanced analytics and ML modeling by building infrastructure for clinical and operational datasets.
- Strengthened data governance, quality, and compliance by executing best practices across cloud ETL pipelines.

.

Packt Publishing | NJ, USA July 2022 – September 2023

Data Engineer, Co-Author

- Co-Authored a technical book titled “ **_Building ETL Pipelines in Python_** ,” guiding large-scale data extraction and transformation while bridging academic and industry practices.
- Wrote 15 hands-on chapters covering PostgreSQL, AWS, and AirSlow, enabling new programmers to build scalable
- Python ETL pipelines for petabyte-sized datasets.
- Improved data mining outcomes by training developers in extraction, transformation, and loading methods.
- Delivered reliable ETL pipelines and exhibited open-source libraries, enhancing enterprise data operations.

.

UNICEF HQ | Valencia, Spain August 2021 – August 2022

Data Engineer, Consultant

- Enabled seamless global data initiatives by designing and maintaining ETL pipelines and data integration architecture across cloud and on-premises systems, including secure data transfer solutions within the Information and Communication Technology Division (ICTD).
- Optimized ETL for global health indicators by building extraction infrastructure from SDMX, GDELT, and HELIX.
- Guaranteed enterprise database uptime and resilience by building high-availability and disaster recovery infrastructure using Failover Clusters, Availability Groups, Multi-Site clusters, and replication.
- Created unique ETL performance and data workSlows for big data processing by via extraction, transformation, and loading pipelines.

.

General Assembly | New York, NY May 2020 – September 2022

Data Scientist, Instructional Associate

- Taught machine learning and statistical modeling to improve data skills for corporate and individual learners.
- Translated advanced AI and clustering algorithms into learning, helping students to assess data effectively.
- Supported diverse learners by delivering lectures on probability theory and neural networks.
- Enhanced workforce skills by creating digital courses on machine learning and deep learning skill shortages.

.

UNICEF HQ | New York, NY February 2021 – July 2021

Data Scientist, Consultant

- Designed and built the Contextual Alert and Trend System (CATS), a proof-of-concept automated system using GDELT for near real-time country-level media monitoring, identifying trends and anomalies in online reports for the Risk Analysis and Preparedness Section (RAPS) within the OfSice of Emergency Operations (EMOPS).
- Attained continuous intelligence on GDELT events per day by building a cloud-based ETL pipeline (Azure Data Factory → Databricks → SQL Server) that improved data importation for near real-time alerting.
- Maintained uninterrupted GDELT operations by creating technical documentation for UNICEF’s data systems.

.

Healthy Together | San Francisco, CA November 2019 – October 2020

Data Scientist

- Built machine learning pipelines using symptom-checker geographic data to improve COVID-19 contact tracing.
- Improved SMS engagement by 25% by developing a Multi-Armed Bandit optimization model.
- Imported production data into Amazon Redshift with AirSlow, enabling near real-time geospatial insights.
- Delivered timely public health insights by deploying business intelligence reporting that tracked health metrics.

.

Robert Half | Los Angeles, CA November 2018 – January 2019

Database Analyst

- Interpreted data to provide insights, guiding stakeholders in optimizing customer retention and service efSiciency.
- Increased financial insight by assessing customer feedback using statistical modeling and regression analysis.
- Mined company datasets to guide policy cutoff, improving consistency in Customer Relations decisions.

.

HBS Management Group LLC | Los Angeles, CA October 2017 – November 2018

Business Data Administrator

- Maintained accurate Sinancial data and compliance reports by reconstructing records and monitoring AR/AP.
- Cut reconciliation errors by designing validation checkpoints for physical-to-digital Sinancial data migration.
- Enhanced vendor efSiciency by analyzing service-to-cost ratios for executive decision-making.

.

Education & Certification

- Master of Public Policy (MPP) , University of Southern California | Beginning Fall 2026 (Dean Merit Scholarship)

- Data Science Academic Certification , Springboard | 2019

- Bachelor of Science, Anatomy & Physiology , California Polytechnic State University | 2015 (Cum Laude)