Data Engineer
DHL
Total des années d'expérience :9 years, 10 Mois
1.Analyze Hadoop clusters using big data analytic tools including Hive, and MapReduce
2.Conduct in-depth research on Hive to analyze partitioned and bucketed data
3.Leveraged Sqoop to import data from RDBMS into HDFS
4.Developed ETL and ELT framework using Shellscript and Hive (including daily runs, error handling, and logging) to clean useful data and improve vendor negotiations
5.Performed cleaning and filtering on imported data using Hive and MapReduce
6.Regularly tune performance of Hive queries to improve data processing and retrieving
7.Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
8.Wrote pyspark Streaming client that emits RDDs from a Kafka topic with defined start and end offsets.
9.Formulated next generation analytics environment, providing self-service, centralized platform for any and all data-centric activities which allows full 360 degree view of customers from product usage to back office transactions.
10.Worked on both Cloud(AWS, GCP & Azure) and on-prem.
11. Constructed product-usage data and data aggregations by using PYSPARK, Scala, Spark SQL and Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dash boarding and ad-hoc analyses.