DATA ENGINEER • CLOUD • PIPELINES

Hi, I’m Piyusha
I build scalable data
systems.

I design reliable ETL workflows, improve data quality, and ship analytics-ready datasets using SQL, Python, Spark, Airflow and cloud services.

PySpark Airflow Snowflake AWS Databricks
Profile photo
4+ Years of Data Engineering
Entrepreneurship & Product Thinking
Marketing & Event Leadership

About

Who I am

I’m a Data Engineer who enjoys building data systems that teams can trust. My work sits at the intersection of engineering and analytics — turning raw, inconsistent inputs into clean, query-ready datasets that power dashboards, reporting, and product decisions.

I care about the details that make pipelines production-ready: data contracts, validation rules, monitoring, and recoverability. I also love simplifying complexity , designing workflows that are easy to maintain, easy to debug, and built to scale.

Beyond tools and pipelines, I value clarity and ownership. I take responsibility for the full lifecycle of data from ingestion to consumption, ensuring definitions are aligned, assumptions are documented, and downstream users can rely on the data. My goal is to make data products feel boringly reliable.

40%
Faster delivery
Pipeline optimization
200+
DQ issues fixed
Across client setups
30%
Less downtime
KPI monitoring
Data quality checks SLA-first pipelines Cost-aware design Clear documentation Ownership mindset Stakeholder-ready outputs

How I work

01

Design for trust

Define clear inputs/outputs, add validation checks, and keep datasets reproducible.

02

Make it observable

Monitor freshness, anomalies, failures, and SLA risks so issues are caught early.

03

Keep it maintainable

Use simple patterns, reusable components, and docs so pipelines scale with the team.

04

Ship with impact

Work backward from the question and deliver clean tables that drive decisions.

05

Communicate clearly

Write crisp updates, align definitions, and make trade-offs visible to stakeholders.

06

Own it end-to-end

Take responsibility for delivery, quality, and long-term reliability — not just code.

What you’ll get
Trust
Validated datasets + clear definitions
Visibility
Monitoring for freshness, failures, and SLAs
Speed
Clean tables that power decisions faster

Skills

Data Engineering icon

Data Engineering

Pipelines, orchestration, reliability

SQL Python Airflow Spark / PySpark Batch + incremental loads CDC patterns Backfills & reprocessing Data modeling (analytics-ready) Data contracts Schema evolution
SLAs Alerts RCA
Cloud Warehousing icon

Cloud & Warehousing

Scalable, cost-aware systems

Snowflake AWS (S3, EC2, Glue, Lambda) Databricks CloudWatch monitoring RDS Cost optimization Security basics (IAM) Azure GCP
Performance Partitioning Cost
Analytics icon

Analytics & Delivery

From raw data → decisions

Power BI Tableau Analytics engineering Metric definitions Dashboard-ready tables Data quality checks Freshness / volume monitoring REST APIs Git Jira Agile Stakeholder management
Stakeholders Requirements Shipping

Education

M.S. Information Systems

New Jersey Institute of Technology

GPA 3.95 / 4.0
Focus Data • Analytics • Cloud

B.E. Electronics & Communication

Punjab Engineering College

GPA 3.70 / 4.0
Strength Systems • Problem-solving

Certifications

Hands-on credentials across data + analytics.

IBM Big Data
BCG GenAI
PwC Power BI
Accenture / KPMG Analytics

Let’s talk

Want to collaborate or have a role in mind? Send a message — I’ll reply soon.