~/projects/

projects

ml pipelines · streaming · big data

ananya@purdue:~$ls -la ./projects

total 2

  • peroxide-ml.md   multi-stage XGBoost for chemical safety · Dow + The Data Mine
  • tweet-sent.md    tweet sentiment with Spark Streaming · big data project
ananya@purdue:~$cat peroxide-ml.md
[ time-sensitive chemical identification tool ]#

Dow · The Data Mine (TDM 511)

  • Worked with corporate mentors to develop a multi-stage XGBoost model predicting chemical peroxide formation, using feature engineering on ionic charges and molecular weights paired with systematic feature selection
  • Collected and preprocessed data on 300+ chemicals
  • Deployed the model as a web application in R Shiny for non-technical users
ananya@purdue:~$cat tweet-sent.md
[ tweet sentiment analysis · spark + hadoop ]#

Big Data Project

  • Conducted sentiment analysis on streaming tweets using Spark Streaming and PySpark
  • Trained multiple machine-learning models (Logistic Regression, Naive Bayes, SVM) for sentiment classification
  • Implemented K-means clustering for data segmentation
  • Optimized model performance through hyperparameter tuning and cross-validation
ananya@purdue:~$exit
© 2026 ananya uppal·built with nuxt·source: github