MS Health Informatics · Hofstra University

Building Healthcare
Data Pipelines
& Clinical ML

I bridge 10+ years of healthcare operations with modern data engineering and clinical machine learning, turning messy healthcare data into production-grade pipelines and predictive models.

🏥
Domain Experience
10+ yrs · Payers, Hospitals, Clinical Practices
🎓
Education
MS Health Informatics — Hofstra University
2025
🔬
Internship
Northwell Health — LLM Evaluation & Clinical AI
⚙️
Focus
Analytics Engineering · Clinical ML · Health AI
👤 Add
Photo

Healthcare Ops Veteran
Turned Data Scientist

I started my career in healthcare operations working across payers, hospitals, and clinical practices and spent a decade understanding how healthcare data is actually generated, broken, and used at the ground level.

Now I'm translating that domain depth into a technical skill set: building end-to-end data pipelines aligned to clinical standards, training ML models on real-world healthcare datasets, and evaluating LLMs in clinical contexts.

My edge isn't just the code — it's that I've lived inside the systems the data comes from. I know why claims get denied, what makes a clinical workflow break, and what a data model needs to be actually useful to a health system.

I'm actively seeking a Data Scientist role where I can build data infrastructure with real clinical impact.

SQL / dbtPythonClinical MLHealth AIEHR / Claims DataFHIR / HL7NLPICH E2B(R3)
Current Status
MS Candidate, graduating 2025
Focus
Data Science with Analytics Engineering Depth
Location
New York
Available For
Full-time roles · Analytics Engineering · Data Science · Data Analyst
Prior Experience Spans
Claims analytics, actuarial datasets, clinical operations, health IT
Portfolio

Selected Projects

End-to-end data engineering, clinical ML, and AI evaluation work built across graduate research and real-world internship experience.

02
Complete

Drug Interaction ML Classification Pipeline

Trained logistic regression and random forest classifiers on the TwoSIDES dataset to predict adverse drug-drug interactions. Full ML pipeline from feature engineering to model evaluation with clinical interpretability focus.

Pythonscikit-learnTwoSIDES DatasetLogistic RegressionRandom Forest
03
Complete

Cardiovascular Disease Risk Prediction

Built a logistic regression model predicting CVD risk using three independent public clinical datasets. Validated across datasets to test generalizability — a key challenge in clinical predictive modeling.

PythonLogistic RegressionClinical DatasetsCross-validation
04
Internship · Northwell Health

LLM Evaluation & Misclassification Analysis

Evaluated LLM performance on clinical tasks at Northwell Health. Conducted structured misclassification analysis to surface failure modes, edge cases, and bias patterns — directly informing responsible deployment decisions.

LLM EvaluationClinical NLPBias AnalysisHealth AI
05
In Progress

dbt Claims Analytics Pipeline

Building an analytics engineering pipeline using CMS Medicare data. Applying dbt to model claims data from raw staging to production-ready mart tables with data quality tests and documentation.

dbtCMS MedicareSQLAnalytics EngineeringData Modeling
06
In Progress

Clinical NLP / AI Evaluation Framework

Developing a structured evaluation framework for clinical NLP using scispaCy and MIMIC-III discharge summaries. Builds on Northwell internship methodology to create reusable evaluation tooling for clinical AI.

scispaCyMIMIC-IIIClinical NLPAI EvaluationPython
Technical Skills

Stack & Expertise

Built through graduate coursework, self-directed learning, and applied project work across the full data lifecycle.

Data Engineering
  • SQL / PostgreSQL / MySQL
  • dbt (data build tool)
  • ETL Pipeline Design
  • REST API Integration
  • Data Modeling
  • Schema Design
Machine Learning
  • scikit-learn
  • Logistic Regression
  • Random Forest
  • Model Evaluation
  • Feature Engineering
  • Clinical ML Pipelines
Clinical & Health Data
  • FHIR / HL7
  • ICD-10 / CPT / SNOMED
  • ICH E2B(R3)
  • EHR / Claims Data
  • CMS Medicare Datasets
  • MIMIC-III
AI / NLP
  • LLM Evaluation
  • scispaCy / Clinical NLP
  • Misclassification Analysis
  • Responsible AI
  • Flask / Plotly Dashboards
  • Python
Experience

Career Timeline

A decade of healthcare operations, now intersecting with data science and health AI.

2024 – 2025
Graduate Data Science Intern
Northwell Health
Evaluated large language model performance on clinical tasks. Conducted misclassification and bias analysis to inform responsible AI deployment decisions within one of the nation's largest health systems.
2023 – Present
MS Candidate, Health Informatics
Hofstra University
Graduate coursework spanning data engineering, clinical ML, NLP, health AI governance, and informatics standards. Building a portfolio of end-to-end projects using real-world healthcare datasets.
Prior · 10+ Years
Healthcare Operations Leader
Payers · Hospitals · Clinical Practices
Broad operational experience spanning health insurance payers, hospital systems, and clinical practices. Roles included claims analytics, actuarial dataset work, and clinical operations management — building deep domain understanding of how healthcare data flows, breaks, and gets used in real-world settings.
Contact

Let's Work Together

I'm actively seeking Data Scientist and Analytics Engineering roles. If you're building something at the intersection of health data and AI, I'd love to talk.