Summary

Overview

Work History

Education

Skills

Websites

Projects

Timeline

Mustafa Mirza

Summary

Senior Data Engineer with expertise in designing scalable Data Mesh architectures, optimizing low-latency data processing using Apache Spark, Apache Flink, and Apache Kafka, and implementing robust microservices-based distributed systems. Experienced in developing and automating real-time ingestion pipelines with Airflow, DBT, and Trino on AWS. Strong background in data governance (RBAC, PII compliance, Apache Ranger) and performance optimization. Passionate about leveraging open-source technologies to build self-service, high-performance data platforms.

Overview

years of professional experience

Work History

Lead Platform Data Engineer

HugoBank

07.2024 - Current

Designed and implemented low-latency ingestion pipelines leveraging Apache Spark, Apache Flink, and Kafka, enabling real-time processing and reducing data ingestion latency by 40% for scalable Data Mesh solutions.
Optimized data storage and retrieval using Hudi, Trino, and Open Metadata, ensuring 99.9% data availability, reducing query response times by 50%, and improving overall governance.
Developed and executed Data Mesh-driven strategies, implementing self-service data platforms, increasing data democratization and access speed by 3x across distributed teams.
Built microservices-based architectures for event-driven data processing, leading to a 30% improvement in pipeline scalability and integration across AWS, GCP, and Azure.
Automated data workflows with Airflow and Cosmos, reducing manual intervention by 70%, ensuring faster and more reliable data operations.
Led a high-performing team of data engineers, and driving a 25% increase in engineering productivity in data analytics and platform automation.

Senior Software Engineer - Big Data and Platform

Bazaar Technologies

07.2022 - 07.2024

Spearheaded scalable Data Warehouse initiatives, ensuring data quality and lineage with DBT, reducing data freshness lag from 12 hours to under 30 minutes.
Developed and maintained 400+ data pipelines using Apache Hudi, Spark, and Airflow, processing 100+ terabytes of data daily while improving pipeline execution efficiency by 60%.
Built an enterprise-grade analytics platform with Apache Superset and Tableau, enabling teams to generate 1,000+ dashboards, increasing real-time reporting accuracy by 35%.
Enhanced architecture monitoring by integrating Prometheus and Loki, reducing MTTR (Mean Time to Resolution) of system failures by 50%, improving overall service uptime to 99.98%.
Implemented fine-grained data governance policies using Apache Ranger, ensuring 100% GDPR compliance, reducing unauthorized data access incidents by 80%.
Optimized AWS infrastructure costs, achieving $12,000/month savings through Spark job optimizations, improved auto-scaling with Karpenter, and S3 storage efficiency improvements.
Developed a Data-as-a-Service (DaaS) solution using GoLang, Trino, and Apache Pinot, reducing query response times from 5 seconds to under 500ms, supporting millions of analytical queries per day.

Education

Bachelors - Computer Science

FAST-NUCES

08.2022

Skills

Big Data & Streaming: Apache Spark, Apache Flink, Apache Kafka, Hadoop
Cloud & Orchestration: AWS (S3, EMR, Glue, Lambda), Airflow, Kubernetes, Docker, Terraform
Data Processing & Storage: Trino, Hive, Hudi, DBT, Redshift, Pinot

Programming & Development: Python, SQL, Scala, GoLang, Bash
Architecture & Governance: Data Mesh, Microservices, Data Security (RBAC, PII Compliance, Apache Ranger), ETL Development, Cost Optimization

Websites

Projects

Financial Data Platform – AWS-Based, Regulatory-Compliant

Technologies: Apache Spark, Flink, Kafka, Airflow, Hudi, Trino, Open Metadata, AWS (S3, EMR, Glue), Python, Scala

Architected a self-service Data Mesh platform, enabling distributed teams to autonomously manage, process, and access high-quality data.
Built ingestion pipelines with Spark and Flink, integrating streaming and batch data sources into a centralized Data Lake.
Ensured schema evolution and data drift management using DBT with Trino, maintaining high data integrity.
Integrated Open Metadata to enhance data cataloging, lineage tracking, and discoverability across multiple business units.

Enterprise Data Mesh Platform

Technologies: AWS (S3, EMR, Glue), Spark, Airflow, DBT, Apache Ranger, Trino

Developed an AWS-based modular data platform, ensuring compliance with financial regulations and security best practices.
Automated ETL processes using Airflow and DBT, enabling real-time reporting and predictive analytics.
Implemented Apache Ranger for role-based access control (RBAC), data masking, and regulatory compliance.

Timeline

Lead Platform Data Engineer

HugoBank

07.2024 - Current

Senior Software Engineer - Big Data and Platform

Bazaar Technologies

07.2022 - 07.2024

Bachelors - Computer Science

FAST-NUCES

Similar Profiles

Mustafa MirzaMustafa Mirza

Lead Platform Data Engineer at HugoBankLead Platform Data Engineer at HugoBank

Prashant KumarPrashant Kumar

<ul><li>Roles and Responsibilities</li><li>Hiring and team building</li><li>Managing team’s career progression and performance reviews</li><li>Managing Performance improvement plan and firing</li><li>Mentoring</li><li>Product roadmap planning and execution</li><li>Accountability of the entire delivery execution including the risk mitigation planning.</li><li>Stakeholder management</li><li>Adhering to Apple Privacy and Security Guidelines for platform</li><li>Design and implementation of new architectures and solutions to manage all online services for AdXchange in Apple Ads.</li><li>Design and implementation of new architectures and solutions deliver Budget experimentation in Apple Ads</li><li>Design and implementation of new architectures and solutions to manage data for Ad Platform.</li><li>Creating and maintaining all the data and analytics and revenue pipelines for Ad Platforms</li><li>Driving the solution for data ingestions and data processing across Ad platform</li><li>Defining the Strategy for data governance and data access</li><li>Collaboration with business to deliver relevant data and insight for our strategy and decisions.</li><li>Managing storage, processing, copy/synchronization etc. appropriate to scale</li><li>Enabling machine learning and algorithm groups by proving them right frameworks and data</li><li>Advancing team’s design methodology and quality programming practices and evangelize those techniques across Ad platform.</li><li>Deliveries</li><li>Setup 8+ member team to design and deliver Budget Allocation and Budget experimentation.</li><li>Leading Ad Delivery services team, responsible for delivering various capabilities like query understanding, targeting, fraud etc.</li><li>Delivered and Lead Design for observability for budget, and data platform in Ad platform.</li><li>Setup 11+ members high performing team to handle various business needs such as analytics and micro services for Display Advertising as well as Data Platform</li><li>Instrumental in driving Data Platform throughout the SDLC from inception, design, development, testing and deployment.</li><li>Instilled a strong team culture of high-quality delivery by constructive feedback, cross collaboration, focus and motivation.</li><li>Delivered Realtime and Batch Platform with various integrations</li><li>Delivered new DIP design for Data Platform</li><li>Delivered design of all the components of Data processing platform</li><li>Transitioned Display data work to India seamlessly and delivered many critical business impacting features and organizational security initiatives.</li><li>Closely worked with various stakeholders to understand the requirements which resulted into efficient remote work culture.</li><li>Seamless Display execution and set up best practices for 24X7 production on-call.</li><li>Tools and Technologies</li><li>AWS, S3, Kafka, Spark, Hadoop, EMR, Hive, Scoop, Flume, Cassandra, Airflow, Oozie, Gobblin, Schema Store, Icloud, Grafana, Kubernetes, Vertica, Alation, snowflake, druid, Iceberg, Java, Scala, IntelliJ, Oracle, MySql, Git, RIO, Spinnaker, Elastic cache, Grpc, Agile.</li></ul> at Apple - Adplatform<ul><li>Roles and Responsibilities</li><li>Hiring and team building</li><li>Managing team’s career progression and performance reviews</li><li>Managing Performance improvement plan and firing</li><li>Mentoring</li><li>Product roadmap planning and execution</li><li>Accountability of the entire delivery execution including the risk mitigation planning.</li><li>Stakeholder management</li><li>Adhering to Apple Privacy and Security Guidelines for platform</li><li>Design and implementation of new architectures and solutions to manage all online services for AdXchange in Apple Ads.</li><li>Design and implementation of new architectures and solutions deliver Budget experimentation in Apple Ads</li><li>Design and implementation of new architectures and solutions to manage data for Ad Platform.</li><li>Creating and maintaining all the data and analytics and revenue pipelines for Ad Platforms</li><li>Driving the solution for data ingestions and data processing across Ad platform</li><li>Defining the Strategy for data governance and data access</li><li>Collaboration with business to deliver relevant data and insight for our strategy and decisions.</li><li>Managing storage, processing, copy/synchronization etc. appropriate to scale</li><li>Enabling machine learning and algorithm groups by proving them right frameworks and data</li><li>Advancing team’s design methodology and quality programming practices and evangelize those techniques across Ad platform.</li><li>Deliveries</li><li>Setup 8+ member team to design and deliver Budget Allocation and Budget experimentation.</li><li>Leading Ad Delivery services team, responsible for delivering various capabilities like query understanding, targeting, fraud etc.</li><li>Delivered and Lead Design for observability for budget, and data platform in Ad platform.</li><li>Setup 11+ members high performing team to handle various business needs such as analytics and micro services for Display Advertising as well as Data Platform</li><li>Instrumental in driving Data Platform throughout the SDLC from inception, design, development, testing and deployment.</li><li>Instilled a strong team culture of high-quality delivery by constructive feedback, cross collaboration, focus and motivation.</li><li>Delivered Realtime and Batch Platform with various integrations</li><li>Delivered new DIP design for Data Platform</li><li>Delivered design of all the components of Data processing platform</li><li>Transitioned Display data work to India seamlessly and delivered many critical business impacting features and organizational security initiatives.</li><li>Closely worked with various stakeholders to understand the requirements which resulted into efficient remote work culture.</li><li>Seamless Display execution and set up best practices for 24X7 production on-call.</li><li>Tools and Technologies</li><li>AWS, S3, Kafka, Spark, Hadoop, EMR, Hive, Scoop, Flume, Cassandra, Airflow, Oozie, Gobblin, Schema Store, Icloud, Grafana, Kubernetes, Vertica, Alation, snowflake, druid, Iceberg, Java, Scala, IntelliJ, Oracle, MySql, Git, RIO, Spinnaker, Elastic cache, Grpc, Agile.</li></ul> at Apple - Adplatform

Mustafa Mirza

Summary

Overview

Work History

Lead Platform Data Engineer

Senior Software Engineer - Big Data and Platform

Education

Bachelors - Computer Science

Skills

Websites

Projects

Timeline

Lead Platform Data Engineer

Senior Software Engineer - Big Data and Platform

Bachelors - Computer Science

Similar Profiles

Mustafa MirzaMustafa Mirza

Prashant KumarPrashant Kumar

Krupesh G NagKrupesh G Nag

Venkat BobbiliVenkat Bobbili

Kunal GautamKunal Gautam