Data Science Portfolio
A collection of ML and statistical modeling projects completed during the Post Graduate Program in Data Science & Business Analytics at Texas McCombs School of Business (2022-2024).
ReneWind — Predictive Maintenance for Wind Turbines
Objective: Build a classification model to predict wind turbine generator failures before they occur.
- Techniques: Random Forest, Gradient Boosting, XGBoost, hyperparameter tuning
- Key Challenge: Highly imbalanced dataset — failures are rare but costly
- Result: Optimized model with recall focus to minimize missed failures
- Relevance: Directly applicable to my work in industrial predictive maintenance
Trade&Ahead — Stock Market Clustering
Objective: Cluster NYSE/NASDAQ stocks by financial metrics to identify investment groupings.
- Techniques: K-Means clustering, Hierarchical clustering, silhouette analysis
- Key Challenge: Feature scaling and optimal cluster selection
- Result: Identified distinct clusters of growth, value, and volatile stocks
INN Hotels — Booking Cancellation Prediction
Objective: Predict hotel booking cancellations to optimize revenue management.
- Techniques: Logistic Regression, Decision Trees, pruning, cross-validation
- Key Challenge: Balancing precision and recall for business impact
- Result: Decision tree model with actionable overbooking recommendations
E-news Express — A/B Testing & Statistical Inference
Objective: Analyze A/B test results for a news website's landing page redesign.
- Techniques: Hypothesis testing, chi-square tests, Mann-Whitney U test
- Key Challenge: Ensuring statistical significance with proper test design
- Result: Data-driven recommendation on landing page conversion optimization
Skills Applied
- Python (pandas, NumPy, scikit-learn, matplotlib, seaborn)
- Supervised learning: Classification, Regression
- Unsupervised learning: Clustering, PCA
- Statistical inference and hypothesis testing
- Model evaluation and hyperparameter tuning