projects
ml pipelines · streaming · big data
ananya@purdue:~$ls -la ./projects
total 2
peroxide-ml.mdmulti-stage XGBoost for chemical safety · Dow + The Data Minetweet-sent.mdtweet sentiment with Spark Streaming · big data project
ananya@purdue:~$cat peroxide-ml.md
[ time-sensitive chemical identification tool ]#
Dow · The Data Mine (TDM 511)
- Worked with corporate mentors to develop a multi-stage XGBoost model predicting chemical peroxide formation, using feature engineering on ionic charges and molecular weights paired with systematic feature selection
- Collected and preprocessed data on 300+ chemicals
- Deployed the model as a web application in R Shiny for non-technical users
ananya@purdue:~$cat tweet-sent.md
[ tweet sentiment analysis · spark + hadoop ]#
Big Data Project
- Conducted sentiment analysis on streaming tweets using Spark Streaming and PySpark
- Trained multiple machine-learning models (Logistic Regression, Naive Bayes, SVM) for sentiment classification
- Implemented K-means clustering for data segmentation
- Optimized model performance through hyperparameter tuning and cross-validation
ananya@purdue:~$exit