Portfolio

A collection of projects in NLP, machine learning and data science

AI-Powered Patient Journey Analytics

This project builds an end-to-end GenAI pipeline that transforms 50 synthetic IBD patient interview transcripts into structured, analyzable records using LLM-powered extraction with Pydantic schemas, chain-of-thought prompting, and confidence scoring. The pipeline combines classical NLP (TF-IDF, LDA/NMF topic modeling, VADER sentiment) with structured LLM extraction (Gemini 2.5 Flash via instructor + litellm) to answer four research questions about biologic treatment adoption, barriers, treatment patterns, and referral pathways.
LLM-Driven Resume Optimizer with Google Gemini and LangChain

This app leverages Google’s Gemini API and LangChain to evaluate and optimize resumes based job descriptions. Combining role prompting, task decomposition, and chain-of-thought, I enginered an AI-powered pipeline that identifies gaps, generates recommendations, and applies them to enhance resume-job alignement. The project integrates a cutting-edge AI pipeline to address a real-world challenge and significantly improve job seekers’ chances of success in the competitive job market.

BERT, Encoders and Linear Models for Resume Text Classification

This project evaluates the performance of advanced NLP models and vectorization techniques for text classifcation using a resume dataset. Implementing Linear SVC, FNN, Encoder models, and BERT, the project achieved an accuracy of 91.67% with BERT. The project demonstrates how to build efficient preprocessing pipelines, optimize feature representation to enhance resource usage, and develop high-performing text classification models using Scikit-Learn and PyTorch.
Sentiment-Optimized Stock Price Forecasting Using Modern RNNs

This project focuses on optimizing the predictive capabilities of modern RNN models for stock price movements of TSLA, AAPL, and GOOG. The goal is to enhance forecasting accuracy by utilizing historical stock data and news sentiment data. The analysis evaluates the performance of LSTM, GRU, and Attention-CNN-LSTM models, tested with and without sentiment data, to determine their effectiveness in stock price prediction.

Airline On-Time Performance

The project explores on-time performance trends from 34 years of US domestic flight data, focusing on variations across carriers, routes, airports, and time. The exploratory data analysis (EDA) resulted in a comprehensive report with 35+ data insights and 25+ visualizations, converted into an interactive Streamlit dashboard. The analysis demonstrates how to extract critical performance trends from historical data, enabling stakeholders to make informed decisions and significantly boost operational efficiency in the aviation industry.
Biodiversity, Endangerement and Conversation in Data from National Parks Service

Embark on a captivating exploration of biodiversity with this data science project, delving into the conservation statuses of endangered species across national parks. Through meticulous analysis, uncover profound insights into the distribution of endangered species, their likelihood of endangerment, and the most frequently spotted species in each park, illuminating the intricate dynamics of wildlife preservation and ecological sustainability.