Big Data Engineer/Architect
Allied Bank Limited
مجموع سنوات الخبرة :8 years, 10 أشهر
Project: Data Lake and Datawarehouse pipelines development using Oracle Data Integrator Informatica PC & BDM, Hadoop Cloudera, Hive, Spark, Sqoop, Kafka
Key Responsibilities:
• Technical transformation of Bank data Business requirements into logical & Physical Data Lake and Data warehouse models in support of OLTP T24 and other banking source system.
• End to end ETL & ELT Development for 16 Sources and preparing data for Predictive, Descriptive Reporting and for Data Services
• Responsible for writing new & optimizing existing ETL processes using PLSQL and TSQL
• Informatica ETLs Development using Informatica BDM and Power Center
• Oracle Data Integrator ETL Development for Change Data Capture (CDC) and Batch processing
• Ingestion, Transformation and Processing of structured and unstructured data into data lake using ODI and Hadoop Ecosystem
• Spark Jobs Development & Optimizations for Distributed In memory processing
• ODI Lifecycle components Developments i.e., Models, Mapping (with transformation like Lookup, Filters, Joins, Aggregates, Knowledge Modules), Variable, Procedure, Data Quality Controls, Packages, Scenario, Complex Files, Load Plans and other components.
Project: Chevron Corporation (Oil Industry) | Azure Data Lake and DWH Pipeline
Key Responsibilities:
• Created Pipelines in ADF using Linked Services to ELT data from different sources like Azure SQL, Blob storage, APIs, Azure SQL Data warehouse. Used Event Hub & Event Grid for ingestion.
• Developed Spark applications using Scala and Spark-SQL using Databricks
• Used HDisight, Jupyter notebooks and Spark-Shell to develop, test and Optimize Spark jobs
• Performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning and writing UDF’s
• Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
• Participated in Service Bus & Cosmos DB performance improvements
• Used EventHub for Stream Data Ingestion using Avro into speed layer processing and Redshift
• Participated in preparing Frameworks for developing Unit Tests cases and Integration testing
• Developed Spark Jobs using Spark SQL queries, Data frames, Datasets APIs
Projects: KPN - Netherland | VEON - Netherland | BICS - Belgium
Key Responsibilities:
• On Prem and AWS Data Lake Administration and Development using Hadoop, Spark, AWS EMR, Glue
• Hive HQL & Bash scripting and Used Spark DF API over Cloudera platform and on AWS EMR.
• Designed and implemented NiFi ETL processors for orchestration and data linages and Data Profiling
• End to end Informatica Power Center and BDM ETL Flows Development using Blaze and Spark
• Sqoop, and Kafka Data streams Ingestions Development & Kerberos, Knox, Ranger, management
• Data movements among Teradata DWH, Landing servers and Hadoop Data Lake using ETL and ELT
• Training delivery for Hadoop and Big Data Projects
Projects: Yahoo Clicks-co - UK | Halfords - UK | Aushan France
Key Responsibilities:
• Data analysis and reporting using in-house analytics tool (Loss Manager) and ETL through Pentaho
• Business Intelligence solutions for affiliate marketing and click stream data using Oracle and Postgres
• Finalization of KPIS and BI Dashboards for optimum showcase of insight to business stakeholders
• Transforming raw data, from multiple sources, into meaningful information for business analysis
• Integrating web analytics into transactional and custom analytics & planning for Big Data Use case
Activities and societies: President - NUCES ACM/IEEE SocietyActivities and societies: President - NUCES ACM/IEEE Society It is one of the best Computer Sciences university in the country. Although the journey at FAST is tough, but it teaches you how to deal with real life problems, tough dead lines, resolve conflicts and various technical skills.