I bring 10+ years inside healthcarefrom payers, hospitals, and health tech to building the data systems and pipelines that help health organizations actually use their data. Targeting Senior Data Analyst, ETL Developer, Data Integration Engineer, and Operations Analyst roles, with Data Engineer as my end goal.
Ten years inside healthcare taught me one thing clearly: the data was always there. The infrastructure to trust it wasn't.
My field is health informatics sitting at the intersection of operations analytics and data engineering. I came up through the operational side: payers, hospitals, health tech, revenue cycle, claims analytics, clinical operations. I didn't learn healthcare from a textbook. I learned it from the inside. I know what the data is supposed to say before I ever build the system around it.
Now I'm building the infrastructure layer. I design ETL pipelines that move raw data from healthcare source systems, clean it, transform it, and load it into warehouses that operations and clinical teams can actually trust. I'm expanding that foundation through a growing portfolio of data engineering projects a Medicare claims data warehouse, an automated ETL pipeline with orchestration and monitoring, and a cloud-native integration platform on Snowflake.
If you're building data systems in healthcare and want someone who understands both the data and the domain, let's connect.
6 healthcare data projects spanning ETL pipelines, data warehousing, clinical ML, and population health dashboards built on real public datasets across payer, clinical, and public health domains.
Built a production-style data engineering pipeline ingesting adverse drug event reports from the openFDA API into an ICH E2B(R3)-aligned MySQL schema. Designed a normalized relational data model capturing drug, reaction, patient, and report entities. Surfaced drug safety signals through a Flask/Plotly interactive dashboard enabling exploration by drug class, reaction type, and report volume over time.
Developed a machine learning pipeline to classify clinically significant drug-drug interactions using the TwoSIDES pharmacovigilance dataset. Engineered features from adverse event co-occurrence patterns and trained logistic regression and random forest classifiers. Evaluated with AUC-ROC, precision-recall curves, and cross-validation with a focus on minimizing false negatives given clinical risk implications.
Built a cardiovascular disease risk prediction model using logistic regression across three public clinical datasets. Performed data harmonization, missing value imputation, and feature selection to produce a unified modeling dataset. Evaluated model calibration and discrimination across demographic subgroups to assess equity implications of risk score deployment in clinical settings.
Built a two-layer PostgreSQL data warehouse on real CMS Medicare claims data, a raw layer where data lands as-is and a clean layer where it's transformed and structured. Wrote a Python pipeline that downloads, cleans, and loads data automatically. Produced ten progressively complex SQL queries covering window functions, CTEs, and PMPM calculations. Final deliverable is a live Looker Studio dashboard with four panels: executive scorecard, PMPM trend, claims by service line, and top high-cost members with date and service line filters.
A fully automated three-stage Prefect pipeline that runs on a schedule — Extract pulls fresh CMS data, Transform applies data quality checks and logs bad records to an error table, Load pushes clean data incrementally into PostgreSQL so only new records are added each run. Includes retry logic for stage failures. A Plotly Dash monitoring dashboard surfaces pipeline run history, records processed per run, error rate over time, and a data quality summary panel.
Cloud-native data integration platform on Snowflake with three schemas — raw, staging, and analytics. A Python ingestion script loads CMS data into the raw schema via the Snowflake connector. SQL transformation scripts promote data through staging to analytics, cleaning, joining, and computing metrics at each layer. Snowflake Tasks automate transformations on a schedule. Three live Tableau Public dashboards serve as the front end: an executive scorecard, a clinical operations dashboard, and a payer analytics dashboard.
Structured analyses of real-world healthcare challenges through an informatics, ethics, and policy lens.
Examines whether telemedicine can address access, cost, and continuity-of-care challenges in Jamaica particularly for rural and underserved communities while remaining aligned with health informatics principles.
Investigates how healthcare organizations can improve their cybersecurity posture to reduce the risk and impact of cyberattacks while protecting sensitive patient data and maintaining clinical operations.
Examines whether CDSS tools consistently prioritize patient care over financial interests as they become increasingly commercialized with analysis of IBM Watson for Oncology and Epic's sepsis model as real-world failures.
Built through graduate coursework, self-directed learning, and applied project work across the full healthcare data lifecycle.
A decade of healthcare operations and data analytics, now building toward data engineering.
If you're building data systems in healthcare and want someone who understands both the data and the domain, let's connect.