Shivasai Batharaju
An accomplished Data Engineer with extensive experience in architecting and deploying robust, scalable data pipelines. Specializing in Machine Learning and Artificial Intelligence initiatives, I leverage expertise in cloud platforms like GCP and AWS to transform complex data into intelligent, actionable solutions. My focus is on driving business value through efficient data management and innovative analytical applications.
Connect on LinkedIn
Transforming Data Into Intelligent Solutions
I empower companies to harness the power of their data, enabling optimum performance from complex problems by converting them into intelligent solutions. With 3 years of experience as a Data Engineer, I design, develop, and optimize end-to-end data pipelines that ensure accuracy, efficiency, and actionable insights.
My expertise spans cloud platforms (GCP, AWS), Big Data tools (Spark, Hadoop, Kafka), and advanced ML/DL models. I've delivered high-impact solutions including sentiment analysis, fraud detection, and predictive modeling across healthcare and financial services.
Top Skills
  • Natural Language Processing (NLP)
  • Docker & Kubernetes
  • Apache Spark (PySpark, Scala)
  • Python & SQL
  • Cloud Infrastructure (GCP/AWS)
Technical Expertise
Cloud Platforms
GCP & AWS expertise with database instances, storage solutions, and scalable data pipelines using RDS, S3, BigQuery, and Cloud Dataflow.
Big Data Tools
Proficient in Spark (PySpark, Scala), Hadoop, and Kafka for processing and analyzing large-scale datasets efficiently.
ML/AI Development
Advanced Machine Learning and Deep Learning models including NLP, sentiment analysis, fraud detection, and predictive modeling.
ETL & Orchestration
Experience with AWS Glue, Matillion, Informatica PowerCenter, DBT, and Apache Airflow for robust data pipeline orchestration.
High-Impact ML Solutions Delivered
30%
Fraud Detection Accuracy
Increased accuracy in detecting fraudulent card transactions through advanced anomaly detection models.
25%
ATM Cash Forecasting
Improved prediction accuracy for ATM cash demand, reducing logistics costs significantly.
40%
Learner Retention
Boosted retention through collaborative recommendation engine using Learning DNA algorithm.
60%
Query Latency Reduction
Optimized Firestore sharding and ETL pipelines for faster data retrieval.
These solutions span healthcare, financial services, and education sectors, delivering measurable business value through data-driven innovation.
Professional Experience
1
Express Scripts Pharmacy Benefit Services
Data Engineer | January 2025 - Present
Specializing in advanced ML/AI solutions and scalable data pipelines on GCP & AWS. Developed sentiment analysis using LLM Bert, ATM cash demand prediction, and fraud detection models. Engineered pipelines with Apache Airflow and managed cloud infrastructure with Kubernetes integration.
2
Grasp Technologies Pvt. Ltd.
Data Engineer | March 2021 - November 2022
Focused on intelligent system design and data optimization. Designed NLP-driven chatbot, developed sentiment analysis model in Spark, processed large datasets to BigQuery via Cloud Dataflow, and created predictive models for high-risk customers.
MindSpace – Data Engineering Leadership
Project Overview
As Data Engineering & Analytics Lead for MindSpace, a scalable micro-certification platform, I architected the data platform supporting real-time insights and personalized learning experiences. The platform integrates adaptive learning, blockchain-verified credentials, and employer-grade validation.
Duration: March 2025 – May 2025
Institution: Indiana Institute of Technology
01
Real-Time Data Pipelines
Designed pipelines using Firebase + ClickHouse, enabling analytics at 1.2M events/min for engagement and retention tracking.
02
Recommendation Engine
Built collaborative recommendation system using Learning DNA algorithm, improving course relevance and retention by 40%.
03
Blockchain Integration
Developed verification-ready employer APIs integrated with blockchain-based certification analytics for credential validation.
04
Monitoring & Optimization
Implemented anomaly detection using Grafana and Prometheus with 92% detection accuracy, optimized query latency by 60%.
Academic Projects & Innovation
Data Security via Steganography
Built Python-based system securing sensitive data by embedding it into images using LSB steganography. Implemented full workflow for encryption, extraction, and integrity validation across multiple formats.
Tech: Python, Tkinter, PIL, LSB Steganography
Object Detection for Visually Impaired
Developed Android app with TensorFlow Lite for real-time object identification via smartphone camera. Integrated GPU-optimized detection with TTS audio feedback for enhanced safety and mobility.
Tech: Android Studio, TensorFlow Lite, TTS API
Education & Certifications
Master's Degree
Information Systems
Indiana Institute of Technology
January 2024 - May 2025
Bachelor's Degree
Computer Science
Bharat Institute of Engineering & Technology
August 2018 - August 2022
Professional Certifications
  • SQL (Advanced) - Demonstrated expertise in complex query optimization and database management
  • Python (Basic) - Foundation in programming and scripting for data engineering tasks
Core Competencies & Technologies
Data Engineering
End-to-end pipeline design, ETL optimization, data quality automation, and scalable architecture for high-volume data processing.
Machine Learning
NLP, sentiment analysis, fraud detection, predictive modeling, and deep learning implementation using TensorFlow and custom algorithms.
Cloud Infrastructure
GCP and AWS deployment, Kubernetes orchestration (EKS/GKE), Docker containerization, and serverless architecture design.
Analytics & Monitoring
Real-time analytics, anomaly detection, Grafana/Prometheus monitoring, and automated testing strategies for data quality assurance.
Programming
Python, SQL, PySpark, Scala, Java/Kotlin for building robust, maintainable data solutions across diverse technology stacks.
Orchestration Tools
Apache Airflow, AWS Glue, Matillion, DBT, and Informatica PowerCenter for workflow automation and pipeline management.
Let's Build Something Innovative Together
I'm passionate about leveraging data engineering and machine learning to solve complex business challenges. With proven expertise in building scalable pipelines, deploying ML models, and optimizing cloud infrastructure, I'm ready to drive innovation and strategic growth.
Whether you're looking to enhance data infrastructure, implement intelligent solutions, or scale analytics capabilities, I bring the technical expertise and business acumen to deliver measurable results.
Location
United States
Experience
3+ Years
Focus Areas
ML/AI, Cloud, Big Data
Made with