Description:
Azure cloud-based contact center analytics project with Databricks, Data Factory, SQL & PySpark. Leverage PySpark to clean call transcripts, perform sentiment analysis using azure cognitive services, and store enriched data in Delta Lake for insightful customer service analysis.
Responsibilities
• Designed and implemented robust Azure Data Factory pipelines to ingest data from on-premises databases into ADLS Gen2. Utilized parameterized pipelines for flexibility, along with activities like Lookup for data enrichment, Set Variable for dynamic configuration, Switch for conditional branching, and Foreach for iterating through data sets. This approach ensures eficient, adaptable, and scalable data movement.
• Proficient in utilizing Databricks for data transformation and interactive analysis. Implemented custom widgets to dynamically control data processing parameters, streamlining workflows and enabling eficient exploration of insights.
• Adept at designing and implementing data pipelines in Databricks to model complex business and managerial hierarchies. Leveraged PySparks DataFrame API, self-joins, User-Defined Functions (UDFs), and window functions to establish relationships, calculate reporting levels, and identify team structures within the organization.
• Spearheaded SAS ETL to cloud migration using tools like SASware Migrate to PySpark for automated conversion & manual adaptation for complex logic. Ensured seamless transition with cloud workflows and data validation.
Toolkit: PySpark, SQL, Delta Tables, ADLS, Azure Databricks, AWS S3, Lakehouse, ETL, Spark Dataframes, Notebooks, Delta Files, Airflow.
Designed and implemented a cloud-based ETL pipeline in Airflow and Databricks to consolidate valuable voice-of-customer data. This pipeline seamlessly ingests data from various sources, including the Qualtrics API, and performs necessary transformations within Databricks to create data to be used for analytics and sentiment analysis. The resulting, enriched data also is then loaded into Delta tables for eficient storage and further analysis. This project empowers data-driven decision making by providing a centralized and reliable source of customer insights.
Description:
Responsibilities
• Built data pipelines using Databricks and Airflow to seamlessly integrate with third-party APIs. These pipelines eficiently ingest data, store it in AWS S3, and leverage Delta Lake on S3 to create structured tables, facilitating eficient data analysis and unlocking valuable insights
• Design and implement intricate data transformations utilizing SQL-like syntax for eficient data manipulation at scale. Skilled in filtering, joins, aggregations, window functions, and UDFs to cleanse, enrich, and prepare data for downstream analytics and machine learning projects.
• Utilize Airflow to automate complex data workflows. Design and schedule Directed Acyclic Graphs (DAGs) with diverse operators, ensuring reliable task execution and eficient data processing pipelines.
• Efectively gather and translate stakeholder requirements into actionable technical solutions. Actively participate in discussions, ask clarifying questions, and ensure alignment between technical capabilities and business needs.
Skills: PySpark, SQL, Delta Tables, ADLS, Azure Databricks, AWS S3, Lakehouse, ETL, Spark Dataframes, Notebooks, Delta Files, Airflow.
- مجال الشركة:
- خدمات تكنولوجيا المعلومات