Healthcare Data
Professional
→ Data Engineer

I bring 10+ years inside healthcarefrom payers, hospitals, and health tech to building the data systems and pipelines that help health organizations actually use their data. Targeting Senior Data Analyst, ETL Developer, Data Integration Engineer, and Operations Analyst roles, with Data Engineer as my end goal.

🏥
Domain Experience
10+ yrs · Payers, Hospitals, Health Tech, Clinical Practices
🎓
Education
MS Health Informatics — Hofstra University
2026
🔬
Current Internship
Northwell Health — LLM Evaluation & Clinical AI
⚙️
Focus
Operations Analytics · ETL Pipelines · Data Integration
TaraJee Clarke

Healthcare Ops to
Data Engineering

Ten years inside healthcare taught me one thing clearly: the data was always there. The infrastructure to trust it wasn't.

My field is health informatics sitting at the intersection of operations analytics and data engineering. I came up through the operational side: payers, hospitals, health tech, revenue cycle, claims analytics, clinical operations. I didn't learn healthcare from a textbook. I learned it from the inside. I know what the data is supposed to say before I ever build the system around it.

Now I'm building the infrastructure layer. I design ETL pipelines that move raw data from healthcare source systems, clean it, transform it, and load it into warehouses that operations and clinical teams can actually trust. I'm expanding that foundation through a growing portfolio of data engineering projects a Medicare claims data warehouse, an automated ETL pipeline with orchestration and monitoring, and a cloud-native integration platform on Snowflake.

If you're building data systems in healthcare and want someone who understands both the data and the domain, let's connect.

Location
New York — open to relocate
Open To
Remote · Hybrid
Available For
Senior Data Analyst · ETL Developer · Data Integration Engineer · Operations Analyst
Prior Experience Spans
Claims analytics, Clinical operations, Health IT
Industries
Payers · Hospitals · Health Tech Startups
Leadership
VP, Health Technology Analytics & Innovation — Hofstra HTAi
Portfolio

Selected Projects

6 healthcare data projects spanning ETL pipelines, data warehousing, clinical ML, and population health dashboards built on real public datasets across payer, clinical, and public health domains.

02
Complete

Drug Interaction ML Classification Pipeline

Developed a machine learning pipeline to classify clinically significant drug-drug interactions using the TwoSIDES pharmacovigilance dataset. Engineered features from adverse event co-occurrence patterns and trained logistic regression and random forest classifiers. Evaluated with AUC-ROC, precision-recall curves, and cross-validation with a focus on minimizing false negatives given clinical risk implications.

100K+
Records Processed
ROC-AUC ~0.69
Model Performance
2 Models
LR + Random Forest
Pythonscikit-learnpandasTwoSIDESLogistic RegressionRandom Forest
03
Complete

Cardiovascular Disease Risk Prediction

Built a cardiovascular disease risk prediction model using logistic regression across three public clinical datasets. Performed data harmonization, missing value imputation, and feature selection to produce a unified modeling dataset. Evaluated model calibration and discrimination across demographic subgroups to assess equity implications of risk score deployment in clinical settings.

3 Datasets
Framingham · UCI · Cardio Train
Cross-validated
Generalizability Test
Equity Analysis
Subgroup Fairness
Pythonscikit-learnpandasLogistic RegressionModel Calibration
04
In Progress

Medicare Claims Data Warehouse & Analytics Pipeline

Built a two-layer PostgreSQL data warehouse on real CMS Medicare claims data, a raw layer where data lands as-is and a clean layer where it's transformed and structured. Wrote a Python pipeline that downloads, cleans, and loads data automatically. Produced ten progressively complex SQL queries covering window functions, CTEs, and PMPM calculations. Final deliverable is a live Looker Studio dashboard with four panels: executive scorecard, PMPM trend, claims by service line, and top high-cost members with date and service line filters.

2-Layer
Raw + Clean Warehouse
PMPM + CTEs
Advanced SQL
Looker Studio
Live Dashboard
PythonpandasPostgreSQLAdvanced SQLLooker StudioCMS Medicare
05
In Progress

Automated Healthcare ETL Pipeline with Orchestration & Monitoring

A fully automated three-stage Prefect pipeline that runs on a schedule — Extract pulls fresh CMS data, Transform applies data quality checks and logs bad records to an error table, Load pushes clean data incrementally into PostgreSQL so only new records are added each run. Includes retry logic for stage failures. A Plotly Dash monitoring dashboard surfaces pipeline run history, records processed per run, error rate over time, and a data quality summary panel.

3-Stage
Extract · Transform · Load
Prefect
Scheduled Orchestration
Dash Monitor
Live Pipeline Health
PythonpandasPrefectPostgreSQLPlotly DashIncremental Load
06
In Progress

End-to-End Healthcare Data Integration Platform on Snowflake

Cloud-native data integration platform on Snowflake with three schemas — raw, staging, and analytics. A Python ingestion script loads CMS data into the raw schema via the Snowflake connector. SQL transformation scripts promote data through staging to analytics, cleaning, joining, and computing metrics at each layer. Snowflake Tasks automate transformations on a schedule. Three live Tableau Public dashboards serve as the front end: an executive scorecard, a clinical operations dashboard, and a payer analytics dashboard.

3-Schema
Raw · Staging · Analytics
Snowflake Tasks
Automated Transforms
3 Dashboards
Executive · Clinical · Payer
SnowflakePythonAdvanced SQLTableau PublicSnowflake TasksCMS Medicare

Health Informatics Analysis

Structured analyses of real-world healthcare challenges through an informatics, ethics, and policy lens.

Health Informatics & Strategy

Telemedicine as a Health Technology Solution in Jamaica

Examines whether telemedicine can address access, cost, and continuity-of-care challenges in Jamaica particularly for rural and underserved communities while remaining aligned with health informatics principles.

Telemedicine reduces geographic and transportation barriers, with highest impact in rural communities lacking specialist access
Continuity of care for chronic disease (diabetes, CVD) improves with remote monitoring and regular follow-up support
Effective adoption requires interoperable EHRs, digital infrastructure investment, and regulatory framework development
Telemedicine is a policy and change-management challenge as much as a technology one
TelemedicineHealth EquityInteroperabilityEHRCaribbean Health Systems
Health Informatics, Security & Privacy

Strengthening Cybersecurity in Healthcare Facilities

Investigates how healthcare organizations can improve their cybersecurity posture to reduce the risk and impact of cyberattacks while protecting sensitive patient data and maintaining clinical operations.

Phishing, ransomware, and insider threats are the dominant attack vectors often exploiting human behavior over technical gaps
EHR outages from cyberattacks directly delay patient care, making cybersecurity a patient-safety issue, not just an IT concern
NIST Cybersecurity Framework applied to structure recommendations across prevention, detection, response, and recovery
Multi-layered strategy combining access controls, staff training, and incident response planning reduces operational impact
CybersecurityNIST FrameworkPHI ProtectionRansomwareHealthcare IT
Health Informatics, Ethics & Clinical Systems

Profit vs. Care: Ethical Risks in Clinical Decision Support Systems

Examines whether CDSS tools consistently prioritize patient care over financial interests as they become increasingly commercialized with analysis of IBM Watson for Oncology and Epic's sepsis model as real-world failures.

Vendor financial relationships with pharma can shape CDSS algorithms toward high-cost treatments, often without clinician visibility
IBM Watson for Oncology recommended unsafe treatments due to narrow training data and lack of independent validation
Epic's sepsis model raised concerns about high false-alert rates and limited transparency into recommendation logic
Algorithmic transparency, independent audits, and cost-aware recommendations are essential for ethical CDSS design
CDSSResponsible AIAlgorithm TransparencyHealth EthicsClinical Governance
Technical Skills

Stack & Expertise

Built through graduate coursework, self-directed learning, and applied project work across the full healthcare data lifecycle.

Languages & Querying
  • SQL (Advanced — CTEs, Window Functions, PMPM)
  • Python
  • pandas / NumPy
ETL & Data Engineering
  • ETL / ELT Pipeline Design
  • Prefect (Orchestration)
  • Incremental Loading
  • REST API Integration
  • Data Quality Testing
  • Batch Pipeline Architecture
Databases & Warehousing
  • PostgreSQL / MySQL
  • Snowflake
  • DuckDB
  • Multi-layer Warehouse Design
  • Schema Design
Visualization & Dashboards
  • Tableau Public
  • Looker Studio
  • Plotly / Plotly Dash
  • Streamlit
  • Flask
  • KPI & Scorecard Design
Healthcare Domain
  • CMS Medicare / Medicaid Data
  • Claims Analytics & PMPM
  • ICD-10 / CPT / NPI / HEDIS
  • EHR / Health IT Systems
  • ICH E2B(R3)
  • SDOH Data Integration
Tools & Workflow
  • Git / GitHub
  • Docker / Docker Compose
  • VS Code
  • Jupyter Notebooks
  • scikit-learn (ML)
  • LLM Evaluation & AI QA
Experience

Career Timeline

A decade of healthcare operations and data analytics, now building toward data engineering.

Jan 2026 – Present
Data Analyst Intern — LLM Quality Assurance
Health System · Hybrid · New York, USA
Evaluated and QA'd an LLM classifying internal healthcare topics for accuracy and workflow alignment. Identified misclassification patterns and delivered structured feedback to improve model performance. Contributed to model validation documentation and responsible AI governance frameworks.
Sep 2025 – Dec 2025
Data Analyst Intern — Health Informatics
Hospital · Hybrid · New York, USA
Built SQL queries and Tableau dashboards for quality and KPI monitoring across hospital departments. Conducted workflow and performance analysis to identify operational improvement opportunities. Supported HIPAA compliance activities with direct EHR and health IT system exposure.
Oct 2023 – May 2024
Data Operations Analyst — Claims & Payment Investigation
Health Tech Startup · Remote · New York, USA
Engineered advanced SQL (CTEs, window functions) to validate high-volume claims data, reducing errors by 70%. Conducted cohort and PMPM analysis to surface cost leakage and close performance gaps. Built KPI monitoring datasets supporting operational optimization across service lines.
May 2023 – Oct 2023
Clinical Data & Operations Coordinator
Health Tech Startup · Remote · New York, USA
Analyzed intake and utilization data across Epic and Salesforce to assess workflow efficiency and patient throughput. Built SQL datasets tracking KPI stability and service-line performance trends. Conducted variance analysis to evaluate the operational impact of workflow and process changes.
Sep 2021 – May 2023
Practice Operations Coordinator
Clinical Practice · On-site · Florida, USA
Analyzed reimbursement and denial data to identify revenue drivers and operational bottlenecks. Developed KPI dashboards tracking revenue cycle performance and scheduling efficiency. Delivered data-driven insights that improved collections predictability and workflow performance.
Aug 2016 – Aug 2021
Data Operations Analyst — Claims & Risk Management
Insurance · Hybrid · Jamaica
Analyzed multi-year claims datasets to identify cost, utilization, and risk trends for actuarial strategy. Built structured datasets supporting actuarial forecasting and long-term pricing models. Improved operational efficiency by 15% through performance monitoring and trend analysis.
Aug 2013 – Sep 2016
Claims Data Associate
Insurance · Jamaica
Processed and validated claims data across multiple lines of business. Built foundational skills in data entry, claims review, and operational reporting within a high-volume insurance environment.
Contact

Let's Work Together

If you're building data systems in healthcare and want someone who understands both the data and the domain, let's connect.