Akash Srivastava has been working professionally in the fields he loves, software and data-culminating. Enhancements and optimization of existing system to tune the performance of project and meets the prerequisites of advance data pipeline for ETL based system. Upgradation of an old system to a new & advanced architecture that serves the purpose of Data Ingestion, Transcription, and Published the processed data ready for business consumption. AWS Cost Reduction and Avoidance of unused services or process. Continuous Purge of S3 Storage where data is older than defined data retention days. All in all, Akash Srivastava is an engaging, intense communicator with a passion for knowledge and understanding.
+ years
Data Engineering
Data Science
Amazon Web Services (AWS)



Pandas, PySpark


 Phycharm, Aginity, Databricks, Hive, Delta Lake, Airflow


Python, SQL, Scala, R, Spark


Data Science



Amazon Web Services (AWS),Databricks


MYSQL, PgSQL, MongoDB, Hive, Delta Lake


Industry Expertise

Pharma medical

Professional experience

Data Engineer - Specialist


Project 1                                   

Client: Lucerna Health

Project Type: Enhancements, Migration and Optimization

Tools: AWS, Spark    

Team size: 4

Role : Senior Data Engineer  

Roles and Responsibilities:

  • Development of Spark code to ingest and recode the source data value into the targeted values.
  • Glue jobs to run the pipelines.
  • Optimization of batch processing and reduction of cost.
  • Various logical changes to speed the process up.
  • Optimizing and tuning the Databricks environment, enabling multithreading to perform multiple queries at a time.
  • Migration of Glue jobs to EMR and changes of Spark codes accordingly.
  • Publishing data to Redshift.

Data Engineer


Project 2

Client: AMGEN                              

Project Type: Upgradation in new Architecture

Tools: AWS, Databricks, Airflow

Team size:12

Role: Module Lead             

Roles and Responsibilities:

  • Implemented the data ingestion and it’s processing to multiple Common Data Layers.
  • Implementing light weight Airflow with Static DAGs.
  • Development of several Databricks notebooks to process data during the Ingestion and Transcription followed by business rules.
  • Developed the logic of Data Quality checks.  
  • Migrating 1200plus tables data approx. of size 400 TB from Hive to Delta.
  • Optimizing and tuning the Databricks environment, enabling multithreading to perform multiple queries at a time.
  • Setting up cluster configuration, enabling auto-termination and autoscaling.
  • Publishing data to Redshift and Spectrum.

Technologies: Databricks, Airflow, PySpark, AWS S3,Amazon Redshift, Redshift Spectrum

Data Engineer


Project 3

Client : AMGEN              

Project Type : Fully Processed

Tools : AWS, Databricks

Team Size: 3

Role : Project Lead

Roles and Responsibilities:

  • Analysis of each services of AWS that can be optimized to reduce the cost.
  • Projected to save approx. USD 50 K in a month.
  • Prepared project plan, SOW and get sing-off from client.
  • Cleaning-up AWS S3 storage by moving approx. 500 TB plus data to glacier deep archive.
  • TerminateEC2 instances of low utilization or unused.
  • Implement Data Retention policy for continuous purge of historical data.
  • Purge Redshift schemas .
  • Optimized some of the existing process to reduce cluster and storage cost.

Technologies : Micro strategy, Business Objects, Google Big Query, SQL Server

Data Engineer


Project 4

Client : AMGEN           

Project Type : Fully Processed

Tools :  AWS, Databricks

Team Size : 3

Role : Project Lead

Period :

Roles and Responsibilities:

  • Analysis ofS3 buckets where storage is unused and can be purged.
  • Cleaned-up700 plus TB data to save appox. USD 16K per month.
  • Prepared data retention policy to clean S3 storage after a certain time period.
  • Implementing the automated process to clean storage .
  • Created S3Life Cycle policy to move data to glacier deep archive.
  • Scheduled Databricks Jobs to run the notebooks and delete unused data and move production data to glacier .

Technologies : AWS S3, Databricks, PySpark, Hive



Customer testimonials

"Top notch candidates, helped me save a lot of time by providing the most qualified resources. They also simplify payroll, find solutions that work for you, and come at a really competitive price point"
"CoCreator perfectly understood the role we required and helped us find us the perfect AWS candidates, saving us plenty of time and resources. They work well and have provided an excellent service."