← Back to portfolio Résumé · Junjie (Caspar) Chen · MMXXVI

Junjie Chen, who goes byCaspar Chen

Data Engineer · Analytics Engineering · Statistics

Data engineer with a statistics background and seven years of moving data from "raw and suspicious" to "trusted and reportable." Strongest where SQL, dbt, Airflow, and Snowflake meet — and where someone needs to write the contract that keeps the numbers honest.

Experience

Data Engineer · Taskrabbit

  • Own large portions of the analytics data platform on Snowflake, dbt, and Airflow 3; design and maintain the bronze → silver → gold modeling layers consumed by Growth, Ops, Finance, and Trust.
  • Authored 200+ dbt models and a shared library of macros & tests that standardized incremental logic, surrogate keys, and SCD‑2 handling across teams.
  • Designed the canonical task‑lifecycle fact table — one row per task with all state transitions — eliminating recurring cross‑team disputes over the definition of a completed booking.
  • Migrated key DAGs to Airflow 3 with deferrable operators, idempotent retries, and explicit data contracts; cut overnight pipeline runtime by ~35%.
  • Run on‑call rotations; write the runbooks and the post‑mortems; mentor newer engineers on SQL style, testing discipline, and warehouse cost hygiene.

Data Analyst, Maps · Apple

  • Supported Apple Maps data quality and evaluation workflows — wrote SQL and Python to surface ground‑truth discrepancies, routing anomalies, and POI coverage gaps across global regions.
  • Built reproducible analyses and dashboards that quantified the impact of map‑data releases on user‑facing metrics; partnered with engineering and operations teams to prioritize fixes.
  • Designed sampling and labeling pipelines for human‑in‑the‑loop review of map features, balancing statistical rigor with reviewer throughput.

Graduate Researcher · Columbia University

  • Coursework and project work in statistical machine learning, time series, nonparametric methods, and Bayesian inference.
  • Independent study: empirical and parametric bootstrap variance estimators for insurance claim counts, validated via Poisson simulation in R.
  • Capstone‑style analysis on the Framingham Heart Study using nonparametric survival methods.

Selected Projects

Neural Style Transfer Web App · PyTorch · Flask · Docker

  • End‑to‑end Flask service exposing Gatys et al. style transfer; containerized for repeatable deployment. Practical introduction to GPU‑bound serving constraints.

Whole‑Life Insurance Loss Simulation · Excel · VBA

  • Simulated the loss‑at‑issue distribution for a fully discrete whole‑life policy under Makeham mortality; reconciled with the MLC Illustrative Life Table.

Education

M.A. Statistics · Columbia University Advanced Data Analysis · Statistical Machine Learning · Time Series · Nonparametric Statistics · Statistical Inference. GPA 3.43.

B.S. Applied Mathematics, Actuarial Emphasis · University of Wisconsin — Madison Certificate in Business. Early SOA exam progress. Officer, actuarial student organization.

Stack

Warehouse Snowflake · PostgreSQL · MySQL
Modeling dbt Core & Cloud · SQL · Jinja
OrchestrateApache Airflow 3 · GitHub Actions
Cloud AWS (S3, Lambda, EC2, IAM) · Docker
Languages Python · SQL · R · Bash
BI Looker · Tableau
Stats Regression · Bootstrap · Time series · Survival
Practice Data contracts · Testing · On‑call
Spoken Mandarin · English · basic Spanish

Certifications