Data engineer with a statistics background and seven years of moving data from
"raw and suspicious" to "trusted and reportable." Strongest where SQL, dbt,
Airflow, and Snowflake meet — and where someone needs to write the contract
that keeps the numbers honest.
Experience
Data Engineer · Taskrabbit
2021 — Present · Bay Area, CA
- Own large portions of the analytics data platform on Snowflake, dbt, and Airflow 3; design and maintain the bronze → silver → gold modeling layers consumed by Growth, Ops, Finance, and Trust.
- Authored 200+ dbt models and a shared library of macros & tests that standardized incremental logic, surrogate keys, and SCD‑2 handling across teams.
- Designed the canonical task‑lifecycle fact table — one row per task with all state transitions — eliminating recurring cross‑team disputes over the definition of a completed booking.
- Migrated key DAGs to Airflow 3 with deferrable operators, idempotent retries, and explicit data contracts; cut overnight pipeline runtime by ~35%.
- Run on‑call rotations; write the runbooks and the post‑mortems; mentor newer engineers on SQL style, testing discipline, and warehouse cost hygiene.
Data Analyst, Maps · Apple
2020 — 2021 · Cupertino, CA
- Supported Apple Maps data quality and evaluation workflows — wrote SQL and Python to surface ground‑truth discrepancies, routing anomalies, and POI coverage gaps across global regions.
- Built reproducible analyses and dashboards that quantified the impact of map‑data releases on user‑facing metrics; partnered with engineering and operations teams to prioritize fixes.
- Designed sampling and labeling pipelines for human‑in‑the‑loop review of map features, balancing statistical rigor with reviewer throughput.
Graduate Researcher · Columbia University
2018 — 2020 · New York, NY
- Coursework and project work in statistical machine learning, time series, nonparametric methods, and Bayesian inference.
- Independent study: empirical and parametric bootstrap variance estimators for insurance claim counts, validated via Poisson simulation in R.
- Capstone‑style analysis on the Framingham Heart Study using nonparametric survival methods.
Selected Projects
Neural Style Transfer Web App · PyTorch · Flask · Docker
2019
- End‑to‑end Flask service exposing Gatys et al. style transfer; containerized for repeatable deployment. Practical introduction to GPU‑bound serving constraints.
Whole‑Life Insurance Loss Simulation · Excel · VBA
2018
- Simulated the loss‑at‑issue distribution for a fully discrete whole‑life policy under Makeham mortality; reconciled with the MLC Illustrative Life Table.
Education
M.A. Statistics · Columbia University
Advanced Data Analysis · Statistical Machine Learning · Time Series · Nonparametric Statistics · Statistical Inference. GPA 3.43.
2018 — 2020
B.S. Applied Mathematics, Actuarial Emphasis · University of Wisconsin — Madison
Certificate in Business. Early SOA exam progress. Officer, actuarial student organization.
2014 — 2018
Stack
Warehouse Snowflake · PostgreSQL · MySQL
Modeling dbt Core & Cloud · SQL · Jinja
OrchestrateApache Airflow 3 · GitHub Actions
Cloud AWS (S3, Lambda, EC2, IAM) · Docker
Languages Python · SQL · R · Bash
BI Looker · Tableau
Stats Regression · Bootstrap · Time series · Survival
Practice Data contracts · Testing · On‑call
Spoken Mandarin · English · basic Spanish
Certifications
- Astronomer — Airflow 3 DAG Authoring · 2026
- Astronomer — Airflow 3 Fundamentals · 2026
- AWS Certified Cloud Practitioner · 2025
- TestDome — SQL Certification · 2020
- LinkedIn — Advanced Python · 2019
- LinkedIn — Integrating Tableau & R · 2019