Summary
Overview
Work History
Education
Skills
Projects
Timeline
Hi, I’m

Mustafa Mirza

Karachi
Mustafa Mirza

Summary

Expert in Apache tools, AWS, Kubernetes, and more. Brings wealth of hands-on experience and problem-solving skills to ensure efficient data governance, compliance, and robust data-serving solutions. Keen eye for detail and specializes in designing, developing, and maintaining highly scalable, secure, and reliable data structures. Works closely with system architects, software architects, and design analysts. Adept at understanding business or industry requirements to develop comprehensive data models. Excels in developing strategic database architectures at the modeling, design, and implementation stages.

Overview

3
years of professional experience

Work History

HugoBank

Lead Platform Data Engineer
07.2024 - Current

Job overview

  • Data Architecture Development: Spearheaded design and implementation of a scalable, modular data architecture leveraging Hudi, Open Metadata, and orchestrating workflows with Airflow coupled with Cosmos. Designed ingestion pipelines using Spark and Flink to handle high-volume, real-time data streams, ensuring seamless integration and transformation across diverse data sources.
  • Data Strategy: Developed and executed long-term data strategies that prioritized stakeholder-driven use cases, addressing scalability and modularity while enabling adaptability across cloud service providers (AWS, GCP, and Azure). Achieved a potential 50% reduction in costs compared to Databricks through innovative modular designs.
  • Analytics Enablement: Deployed Superset to support advanced business intelligence (BI) capabilities, empowering teams with real-time data exploration and custom dashboards for actionable insights.
  • Data Drift and Schema Evolution: Addressed challenges in data drift and schema evolution using DBT and Trino, ensuring continuous alignment between dynamic business needs and data pipelines.
  • Cost Optimization: Designed a modular platform strategy compatible with any cloud service provider, achieving a 50% cost reduction potential compared to Databricks through tool optimization and platform scalability.
  • Data Governance: Established robust frameworks with RBAC, PII compliance, and centralized governance policies using Apache Ranger, ensuring secure, scalable, and compliant data management in alignment with SBP guidelines.
  • Data Platform Enablement: Architected a data platform with high-performance compute resources and scalable storage, automating ingestion pipelines with Airflow and Cosmos, supporting real-time processing and reporting workflows.
  • Leadership: Built a high-performing engineering team skilled in Spark, Flink, DBT, and Hudi, fostering innovation in areas like data analytics and platform automation.

Bazaar Technologies

Senior Software Engineer - Big Data and Platform
07.2022 - 06.2024

Job overview

  • Spearheaded initiatives to enhance Data Quality within our Data Warehouse, using DBT for lineage maintenance and regular testing to ensure data freshness/correctness.
  • Manage and Build scalable Data pipelines using Apache Hudi and Spark with a reliable underlying architecture on AWS EKS and EMR, ingesting data with volumes up to 100+ terabytes of data and Karpenter Enabled Scaling for unforeseen increases in data traffic.
  • Data scheduling using Airflow and DBT to create and maintain up to 400+ Data products with Data Lineage to ensure no upstream errors affect downstream.
  • Analytics platform using Apache Superset and Tableau for data exploration as well as visualization with up to 1000+ dashboards created by users enabled by the platform.
  • Architecture monitoring using Prometheus and Loki to export JVM/service metrics to Grafana where we designed easy-to-read intuitive dashboards for service health and uptimes.
  • Implemented Apache Ranger to ensure GDPR compliance as well as fine-grained Data governance throughout our users such that data is only exposed to those with correct clearance level.
  • Implemented cost-saving measures on AWS by optimizing Spark jobs, monitoring resources with CloudWatch, leveraging Karpenter auto-scaling to optimize resource utilization, optimizing Spark jobs to reduce NAT gateway usage, and optimizing S3 operations using metadata tables.
  • Achieved $12,000/month cost reduction.
  • Developed a DAAS service to serve data to our applications reliably using Go-lang powered by Trino querying engine and Apache Pinot as our OLAP store.

Education

FAST-NUCES
Karachi, SD

Bachelors from Computer Science
08.2022

University Overview

GPA: 3.48gpa

Skills

  • AWS
  • Machine learning
  • Kubernetes
  • Python
  • Apache tools
  • SQL
  • Spark
  • Data governance
  • ETL development
  • Data security

Projects

Financial Data Platform – AWS-Based, Regulatory-Compliant, Designed and deployed a modular, AWS-based data platform for a Designed and deployed a robust, modular data platform for a modern digital bank, ensuring compliance with security regulations and enabling flexible tool integration. The platform supported high-quality, real-time data analytics and operational scalability.


Architecture Highlights:

  • Ingestion Pipelines: Engineered high-performance ingestion pipelines using Apache Spark and Apache Flink for diverse data sources (RDS, DocumentDB, APIs), enabling efficient and reliable real-time data integration.
  • Data Lakes and Modeling: Implemented Apache Hudi to manage raw, silver, and gold layers with version-controlled, scalable transformations. Used DBT with Trino to address data drift and schema evolution challenges, ensuring accurate, reliable analytics.
  • Orchestration and Automation: Orchestrated and automated complex data workflows with Apache Airflow and Cosmos, enhancing task execution efficiency and operational reliability.
  • Advanced Analytics: Deployed Apache Superset for real-time business intelligence and interactive dashboards, enabling business teams to make data-driven decisions with up-to-date insights.
  • Data Governance: Integrated Apache Ranger for comprehensive data governance, implementing role-based access control (RBAC), data masking, and compliance with SBP guidelines and other regulatory standards.

Timeline

Lead Platform Data Engineer

HugoBank
07.2024 - Current

Senior Software Engineer - Big Data and Platform

Bazaar Technologies
07.2022 - 06.2024

FAST-NUCES

Bachelors from Computer Science
Mustafa Mirza