Sandeep Singh

Work Experience

Total years of experience :10 years, 5 Months

Senior AI Engineer at TARGET

United Arab Emirates
My current job since November 2019

• Led a team of 4 engineers in developing and maintaining a high-performance, distributed Feature Dataset with more than 200 features
• Created and managed data pipelines, involving extracting data from various sources, transforming it into a usable format, and loading it into data storage or analytics platforms
• Designed and upheld a framework for automated ETL processes, guaranteeing seamless execution of data integration and transformation tasks while reducing the need for manual involvement
• Implemented the seamless migration of on-premise data systems to Google Cloud Platform (GCP), ensuring minimal disruption and maximizing efficiency in data storage, processing, and management
• Developed a library of common PySpark functions and deployed it in virtual environment which is being used across multiple teams, thus reducing the development time
• Executed data quality measures, data governance protocols, and data validation checks, resulting in a 60% decrease in data errors, thereby enhancing the precision of analyses and decision-making processes
• Designed and maintained data warehouses/data lakes to store structured and unstructured data efficiently, setting up schemas, and optimizing for query performance
• Implemented the deployment of several CPU-based AI/ML models to GPU using Docker, Kubernetes, GPU Array, resulting in enhanced efficiency across multiple models and reduced runtime
• Provided technical mentorship to junior team members, conducting code and design reviews, and enforcing coding standards and best practices

Big Data Consultant at XEBIA

United Arab Emirates
August 2018 to November 2019

• Led the migration of petabytes of unstructured/semi-structured data from legacy systems (TeraData, CR, and
Informatica) to AWS.
• Created and upheld a data lake housing more than 1 PB of data, facilitating data-informed decision-making for critical business endeavors
• Developed efficient framework for staging, cleansing, transforming, and loading data using HDP, HDFS, Spark, Hive, and Sqoop
• Optimized multiple batch and stream processing workflows for increased performance and reliability
• Worked closely with the data science team to comprehend their needs and convert data into the necessary formats

Senior Associate at Innodata

India - Noida
November 2013 to June 2018

Devised and executed a real time data pipeline for processing semi-structured data, amalgamating 150 million raw records sourced from over 30 data origins through Kafka and PySpark.
Developed an in-house Python library utilized for parsing and reformatting data obtained from external vendors, resulting in a 7% decrease in the error rate within the data pipeline
Created various lambda functions for data cleansing and transformation using Scala and Spark API

Education

Bachelor's degree, Physics (Hons.)

at Delhi University
April 2012

Specialties & Skills

Hadoop

Products By Bayt.com

Share My Profile

Block User