Back to Projects
Pythonscikit-learnClassificationClustering

Data Science Portfolio

A collection of ML and statistical modeling projects completed during the Post Graduate Program in Data Science & Business Analytics at Texas McCombs School of Business (2022-2024).

ReneWind — Predictive Maintenance for Wind Turbines

Objective: Build a classification model to predict wind turbine generator failures before they occur.

  • Techniques: Random Forest, Gradient Boosting, XGBoost, hyperparameter tuning
  • Key Challenge: Highly imbalanced dataset — failures are rare but costly
  • Result: Optimized model with recall focus to minimize missed failures
  • Relevance: Directly applicable to my work in industrial predictive maintenance

Trade&Ahead — Stock Market Clustering

Objective: Cluster NYSE/NASDAQ stocks by financial metrics to identify investment groupings.

  • Techniques: K-Means clustering, Hierarchical clustering, silhouette analysis
  • Key Challenge: Feature scaling and optimal cluster selection
  • Result: Identified distinct clusters of growth, value, and volatile stocks

INN Hotels — Booking Cancellation Prediction

Objective: Predict hotel booking cancellations to optimize revenue management.

  • Techniques: Logistic Regression, Decision Trees, pruning, cross-validation
  • Key Challenge: Balancing precision and recall for business impact
  • Result: Decision tree model with actionable overbooking recommendations

E-news Express — A/B Testing & Statistical Inference

Objective: Analyze A/B test results for a news website's landing page redesign.

  • Techniques: Hypothesis testing, chi-square tests, Mann-Whitney U test
  • Key Challenge: Ensuring statistical significance with proper test design
  • Result: Data-driven recommendation on landing page conversion optimization

Skills Applied

  • Python (pandas, NumPy, scikit-learn, matplotlib, seaborn)
  • Supervised learning: Classification, Regression
  • Unsupervised learning: Clustering, PCA
  • Statistical inference and hypothesis testing
  • Model evaluation and hyperparameter tuning