Data Engineer - Specialist
Agilisium
Project 1
Client: Lucerna Health
Project Type: Enhancements, Migration and Optimization
Tools: AWS, Spark
Team size: 4
Role : Senior Data Engineer
Roles and Responsibilities:
- Development of Spark code to ingest and recode the source data value into the targeted values.
- Glue jobs to run the pipelines.
- Optimization of batch processing and reduction of cost.
- Various logical changes to speed the process up.
- Optimizing and tuning the Databricks environment, enabling multithreading to perform multiple queries at a time.
- Migration of Glue jobs to EMR and changes of Spark codes accordingly.
- Publishing data to Redshift.
Data Engineer
Agilisium
Project 2
Client: AMGEN
Project Type: Upgradation in new Architecture
Tools: AWS, Databricks, Airflow
Team size:12
Role: Module Lead
Roles and Responsibilities:
- Implemented the data ingestion and it’s processing to multiple Common Data Layers.
- Implementing light weight Airflow with Static DAGs.
- Development of several Databricks notebooks to process data during the Ingestion and Transcription followed by business rules.
- Developed the logic of Data Quality checks.
- Migrating 1200plus tables data approx. of size 400 TB from Hive to Delta.
- Optimizing and tuning the Databricks environment, enabling multithreading to perform multiple queries at a time.
- Setting up cluster configuration, enabling auto-termination and autoscaling.
- Publishing data to Redshift and Spectrum.
Technologies: Databricks, Airflow, PySpark, AWS S3,Amazon Redshift, Redshift Spectrum
Data Engineer
Agilisium
Project 3
Client : AMGEN
Project Type : Fully Processed
Tools : AWS, Databricks
Team Size: 3
Role : Project Lead
Roles and Responsibilities:
- Analysis of each services of AWS that can be optimized to reduce the cost.
- Projected to save approx. USD 50 K in a month.
- Prepared project plan, SOW and get sing-off from client.
- Cleaning-up AWS S3 storage by moving approx. 500 TB plus data to glacier deep archive.
- TerminateEC2 instances of low utilization or unused.
- Implement Data Retention policy for continuous purge of historical data.
- Purge Redshift schemas .
- Optimized some of the existing process to reduce cluster and storage cost.
Technologies : Micro strategy, Business Objects, Google Big Query, SQL Server
Data Engineer
Agilisium
Project 4
Client : AMGEN
Project Type : Fully Processed
Tools : AWS, Databricks
Team Size : 3
Role : Project Lead
Period :
Roles and Responsibilities:
- Analysis ofS3 buckets where storage is unused and can be purged.
- Cleaned-up700 plus TB data to save appox. USD 16K per month.
- Prepared data retention policy to clean S3 storage after a certain time period.
- Implementing the automated process to clean storage .
- Created S3Life Cycle policy to move data to glacier deep archive.
- Scheduled Databricks Jobs to run the notebooks and delete unused data and move production data to glacier .
Technologies : AWS S3, Databricks, PySpark, Hive