Full Stack Engineer at Fidelity Investments

Hi, I'm Rhea Bhatia

Building scalable applications, real-time data pipelines, and machine learning systems that transform complex data into actionable insights.

View My Work Contact Me

GitHub LinkedIn

About Me

I'm a Full Stack Engineer at Fidelity Investments, where I build and optimize a high-performance, real-time trading application used by thousands of users.

I enjoy working across the stack, from designing backend systems and APIs to building user-facing interfaces, with a focus on performance, scalability, and reliability.

Outside of work, I'm deeply interested in data science, machine learning, and data engineering. I enjoy building projects that explore data analysis, model development, and real-time data systems, especially when they involve turning large volumes of raw data into meaningful insights.

Areas of Interest

Full Stack Systems

(Backend-focused)

Data Science & Machine Learning

Data Engineering & Real-Time Pipelines

AI-Driven Applications

Featured Projects

Deep dives into building scalable data engineering pipelines and machine learning systems from scratch to production.

Machine Learning Data Science Model Deployment Backend Systems

Predictive Churn Engine

End-to-end data science and machine learning system for the telecom industry with API deployment and monitoring using XGBoost and FastAPI.

View Full Breakdown

Data Engineering Distributed Systems Real-Time Systems Backend Systems

AI-Powered Real-Time Log Monitoring System

Real-time data pipeline and AI-powered system for log ingestion, anomaly detection, and incident analysis using Kafka, ClickHouse, and LLaMA 70B.

View Full Breakdown

Let's Connect

Drop me an email Connect on LinkedIn

Back to Projects

Machine Learning Data Science Model Deployment Backend Systems

Predictive Churn Engine

End-to-end data science and machine learning system with API deployment and monitoring

Python Pandas NumPy Scikit-learn XGBoost FastAPI Docker Pytest

🚨 Problem

In the telecom industry, customer churn directly impacts revenue, but identifying at-risk customers is challenging due to the complex interplay of behavioral and financial factors, including contract type, tenure, service usage, and billing patterns.

Without a data-driven system, telecom providers rely on reactive strategies, making it difficult to detect early warning signals of churn, resulting in lost revenue and missed opportunities for targeted customer retention.

🚀 Key Highlights

✓ Conducted exploratory data analysis (EDA) to identify key churn drivers
✓ Engineered features to better capture customer behavior patterns
✓ Built and optimized XGBoost model (ROC-AUC: 0.85)
✓ Tuned decision threshold (0.33) for recall-focused predictions
✓ Deployed model via FastAPI with drift detection and automated testing

🔍 Full Breakdown

📊 Data Exploration & Insights

Analyzed customer behavior across tenure, contract type, and billing
Identified high-risk churn segments such as month-to-month contracts, short tenure customers, and high monthly charges
Validated insights through visual EDA and statistical analysis

🧬 Feature Engineering

Created TotalChargesPerMonth to capture spending behavior over time
Cleaned dataset by handling missing and non-numeric values
Applied preprocessing pipeline with one-hot encoding and feature scaling

🤖 Modeling & Evaluation

Trained Logistic Regression (baseline) and XGBoost (final model)
Performed hyperparameter tuning using RandomizedSearchCV
Achieved ROC-AUC of 0.85, accuracy of 78.1%, and F1 score of 0.65

⚖️ Business Optimization

Tuned classification threshold to 0.33 to prioritize recall
Focused on identifying high-risk customers for retention strategies

🚀 Production System

Built FastAPI service with endpoints /health, /model-info, and /predict
Implemented input validation using Pydantic
Designed API for seamless frontend or service integration
Added drift detection script for monitoring model performance
Containerized with Docker and added pytest-based tests

📈 Impact

★ Enables proactive churn prediction using data-driven insights
★ Demonstrates full lifecycle from data exploration to deployment and monitoring
★ Bridges data science and backend engineering in a production-ready system

View on GitHub

Back to Projects

Data Engineering Distributed Systems Real-Time Systems Backend Systems

AI-Powered Real-Time Log Monitoring System

Real-time data pipeline and AI-powered system for log ingestion, anomaly detection, and incident analysis

Kafka / Redpanda ClickHouse Python Streamlit Groq (LLaMA 70B) Docker

🚨 Problem

Modern distributed systems (e.g., microservices architectures and trading platforms) generate millions of logs per minute, making it extremely difficult to identify meaningful patterns and detect critical issues in real time.

Failures such as API timeouts, database outages, and infrastructure bottlenecks are often buried in noisy log streams, leading to delayed incident detection and inefficient debugging workflows.