Big Data Cloud Engineer- AXA-Gulf,Dubai
Cognizant - United Arab Emirates
Total years of experience :9 years, 5 Months
Axa Gulf Cloud Data Lake project is part of strategic program to build cloud native Data Lake for Analytics on AWS.
The project when finished will provide cataloging, curation, and exploitation layer for data sources from core policy administration system, sales and marketing system and mdm systems.
The Data Lake will be built on native amazon services closely mirroring AWS reference architecture with Glue as back bone for large scale, parallel data processing
I authored core ETL workflows built on Glue along with Orchestration and scheduling pipeline in AWS steps
Lead first ever implementation of large scale Glue and Steps in the Gulf and Middle East for insurance
Implemented bloom filter driven large scale key comparison design to reduce time taken for look ups
Developed automated testing for code changes to reduce turn around time of deployment
This project was intended to transform the Abinitio jobs to Spark jobs to deliver similar data delivery to other internal business & BI teams.
Transformations are written in scala & applied on data using Spark SQL. This is a property file based framework where for every new entity the base model (import, staging etc) remains same but the custom transform are used in plug & play mode. All the data used here are from RDBMS source & in data like it is kept in hive. Finally Processed data consumed by Analytics team for reporting purpose.
I have also prepared a property file driven dataframe testing framework for automation of qa job . Using this framework one can perform different dataframe testing activities like df comparison, transformation check etc & can run any number of tests at the same time.
This project was intended to migrate data from the structured & untrusted mainframe generated files /csv/excel/raw files to Hive tables & applying custom logic on top of the data for direct business use.
This is a development project of the Hartford Insurance company to ingest third party structured & unstructured data into Hadoop ecosystem. The data sources comprise of Oracle and Raw File Server. Raw data Ingested into HDFS system and
transformation logic implemented on this using Hive, later with Spark Sql. Finally, the processed data was being consumed by
BI team for reporting purpose.
Apart from that, implemented a data quality module to validate if any post transformation data is getting corrupted as per the business logic for some specified key columns.
This is an ingestion & data transformation framework building project for Hartford Insurance company.
EDIS is a framework developed to be a single point of solution from data ingest, data quality management & data transformation. Data got stored into HDFS. Hive Sqoop, Spark sql has been used to achieve the major activities of the EDIS module. Data sources were structured as well as unstructured in terms of RDBMS & mainframe files. All the jobs were configured using hbase tables with the help of Apache Phoenix. Batch creation, data ingestion, batch id tagging was common for any data ingestion process whereas data quality check & transformations services can be selectively chosen. Common interface was chosen for all different kind of file ingestion & every module can be used in plug & play manner by changing the property tables.
The above mentioned EDIS project was targeted to ingest TeraData data. An application was developed using Java & Hadoop framework mainly to offload the data from Teradata platform, store it in HDFS & Hive & lastly to export the data to oracle if needed.We have used Teradata JDBC connector to ingest the data using Sqoop & prepared scripts to validate data from HDFS & Hive.
Travel port was a T&H client having particular requirement of GDS data & functionality migrate to Hadoop & custom Java application development as per requirement. Mainly involved in data migration through Sqoop to Hive & writing existing fare generation logic using Java. The project was designed like main data source remaining GDS system, but for recursive fare generation of similar entities or for the popular searches the data needed to be migrated to Hive & periodic flight schedule table & fare updates were needed to be done for quick service & exact ticking services.
Also written scripts for validation of Hive data based upon a search with GDS search result for real time
Completed B.Tech in the Information Technology. I am already having B.Tech degree certificate UAE legalized. Major project was on : Usability metric based recommendation system using customer review available in eCommerce website. Used selenium to scrape any amount of review data from flipkart(leading Indian eCommerce website) on any selected product. Devised algorithm by using NLP to determine which product fits the customer requirement best