Open to Work

Manisha Yadav

Data Engineer

I build robust, scalable data pipelines that turn raw streams into actionable intelligence. Specializing in cloud-native architectures, real-time processing, and lakehouse design.

scroll

About Me

Turning data chaos into clarity

I'm a data engineer with 5+ years of experience designing and maintaining large-scale data infrastructure. I care deeply about pipeline reliability, data observability, and building systems that analysts and scientists can genuinely trust.

Currently seeking senior data engineering roles where data is treated as a first-class citizen. I thrive in fast-moving environments with complex, ambiguous data problems that require both technical depth and cross-functional collaboration.

Outside of work, I write about data architecture and contribute to open-source tooling in the modern data stack ecosystem.

5+
Years of experience
12
Pipelines in production
3
Cloud platforms mastered
99.9%
Pipeline uptime SLA

Experience

Where I've worked

A track record of building high-impact data systems across industries.

Senior Data Engineer
Jan 2022 — Present
Acme Analytics · Remote
  • Architected an event-driven streaming platform on Apache Kafka, processing 2M+ events/day with sub-100ms latency.
  • Led migration from on-prem Hadoop to AWS (S3, Glue, Redshift), reducing infrastructure costs by 40%.
  • Implemented dbt transformation layer that reduced analyst query time from hours to minutes.
  • Built a data quality monitoring framework with Great Expectations, covering 95% of critical pipelines.
Data Engineer
Mar 2020 — Dec 2021
DataWave Inc. · New York, NY
  • Designed and maintained 8 production ETL pipelines with Apache Airflow, reducing data latency by 65%.
  • Collaborated with the ML team to build a feature store on GCP BigQuery, serving 10+ models in production.
  • Introduced CI/CD practices for pipeline deployments, cutting release cycles from weeks to hours.
Junior Data Engineer
Jul 2018 — Feb 2020
Fintech Startup · San Francisco, CA
  • Built REST API ingestion pipelines consolidating data from 15+ third-party financial providers.
  • Developed an internal data catalog using Apache Atlas, improving discoverability across the organization.
  • Owned reporting infrastructure on Looker + PostgreSQL, enabling self-serve analytics for business teams.

Technical Skills

Tools & technologies

The stack I reach for when building data systems — from ingestion to serving.

Languages & Query
PythonSQLScalaBashJava
Cloud Platforms
AWSGCPAzureSnowflakeDatabricks
Processing & Streaming
Apache SparkApache KafkaApache FlinkdbtApache Beam
Orchestration & Monitoring
AirflowPrefectDagsterGrafanaDataDog
Databases & Storage
PostgreSQLRedshiftBigQueryMongoDBRedisDelta Lake
DevOps & Infrastructure
DockerKubernetesTerraformGitHub ActionsHelm

Projects

Things I've built

A selection of personal and professional data engineering projects.

🔀

Real-Time Analytics Pipeline

End-to-end streaming pipeline ingesting clickstream data from 500K daily users into a lakehouse architecture with live dashboards and ML-based anomaly detection.

KafkaSpark StreamingDelta LakeAWSAirflow
🏗️

Open-Source dbt Accelerator

A library of reusable dbt macros and data quality tests — covering SCD Type 2, slowly-changing dimensions, and common data contract patterns.

dbtPythonJinjaSQLCI/CD
📊

Data Quality Observatory

Lightweight data observability framework on top of Great Expectations + Grafana, tracking schema drift, null rates, freshness, and SLA breaches in real time.

PythonGr. ExpectationsGrafanaAirflow

Blog

Writing on data engineering

I write about data architecture, pipeline design, and what I'm learning along the way.

Contact

Let's work together

I'm actively looking for my next role as a senior data engineer. If you're building a world-class data platform and need someone who cares about quality, reliability, and craft — let's talk.