Portfolio

A collection of projects in NLP, machine learning and data science

NLP

  • AI-Powered Resume Optimizer with Google Gemini and LangChain

    AI

    LangChain

    Gemini

    Streamlit

    NLTK

    BeautifulSoup

    This app leverages Google’s Gemini API and LangChain to evaluate and optimize resumes based job descriptions. Combining role prompting, task decomposition, and chain-of-thought, I enginered an AI-powered pipeline that identifies gaps, generates recommendations, and applies them to enhance resume-job alignement. The project integrates a cutting-edge AI pipeline to address a real-world challenge and significantly improve job seekers’ chances of success in the competitive job market.

  • BERT, Encoders and Linear Models for Resume Text Classification

    Deep Learning

    PyTorch

    Tensorflow

    HuggingFace

    Scikit-Learn

    SBERT

    Gensim

    NLTK

    PyLDAVis

    This project evaluates the performance of advanced NLP models and vectorization techniques for text classifcation using a resume dataset. Implementing Linear SVC, FNN, Encoder models, and BERT, the project achieved an accuracy of 91.67% with BERT. The project demonstrates how to build efficient preprocessing pipelines, optimize feature representation to enhance resource usage, and develop high-performing text classification models using Scikit-Learn and PyTorch.

  • Sentiment-Optimized Stock Price Forecasting Using Modern RNNs

    Deep Learning

    Tensorflow

    HuggingFace

    Scikit-Learn

    Gensim

    NLTK

    This project focuses on optimizing the predictive capabilities of modern RNN models for stock price movements of TSLA, AAPL, and GOOG. The goal is to enhance forecasting accuracy by utilizing historical stock data and news sentiment data. The analysis evaluates the performance of LSTM, GRU, and Attention-CNN-LSTM models, tested with and without sentiment data, to determine their effectiveness in stock price prediction.

Exploratory Data Analysis

  • Biodiversity, Endangerement and Conversation in Data from National Parks Service

    Data Analysis

    Pandas

    NumPy

    Matplotlib

    Seaborn

    Embark on a captivating exploration of biodiversity with this data science project, delving into the conservation statuses of endangered species across national parks. Through meticulous analysis, uncover profound insights into the distribution of endangered species, their likelihood of endangerment, and the most frequently spotted species in each park, illuminating the intricate dynamics of wildlife preservation and ecological sustainability.

  • Operational Analysis of Airline On-Time Performance

    EDA

    Pandas

    NumPy

    Matplotlib

    Seaborn

    Dashboarding

    The project explores on-time performance trends from 34 years of US domestic flight data, focusing on variations across carriers, routes, airports, and time. The exploratory data analysis (EDA) resulted in a comprehensive report with 35+ data insights and 25+ visualizations, converted into an interactive Streamlit dashboard. The analysis demonstrates how to extract critical performance trends from historical data, enabling stakeholders to make informed decisions and significantly boost operational efficiency in the aviation industry.

Machine Learning

No matching items

I am Marco Camilo,
an NLP Engineer &
Data Scientist.