AI/ML
Completed

WebCreek Employee Churn Prediction

Machine Learning HR Analytics solution predicting employee turnover probability, identifying key churn indicators, and providing actionable retention strategies.

Overview

Supervised binary classification project predicting employee turnover at WebCreek, an outsourcing company based in the USA. Analyzes multiple variables from employee datasets and compares salary data by role and country to identify employees at risk of leaving. The Extra-Trees Classifier model achieved the highest performance with comprehensive feature engineering including tenure, contract expiration, performance reviews, and salary comparisons to market rates.

Technologies & Tools

Python
Scikit-learn
Pandas
NumPy
Matplotlib
Seaborn
Flask
Extra-Trees Classifier
Category Encoders
GridSearchCV

Key Features

  • 1Binary classification model predicting employee churn with Extra-Trees Classifier
  • 2Comprehensive feature engineering: tenure, contract expiration, performance reviews, salary comparison
  • 3Comparison of 7 ML algorithms: Random Forest, SVC, Neural Network, KNN, Gradient Boosting, AdaBoost, Extra-Trees
  • 4Flask REST API for real-time predictions with risk categorization (Safe, Low, Medium, High)
  • 5Integration with SalaryExpert data for market rate comparisons across positions and countries
  • 6Model evaluation using ROC-AUC, confusion matrix, precision, recall, and F1-score
  • 7Feature importance analysis identifying Department, Contract Expiration, and Business Unit as top predictors
  • 8Automated risk scoring system with actionable retention recommendations

Challenges & Solutions

Feature Engineering for HR Analytics

Challenge: Raw employee data needed transformation into meaningful predictive features. Salary data was in various currencies and positions varied by country.

Solution: Created engineered features: months at company, remaining contract months, months after last review, intern flag. Integrated SalaryExpert data to calculate relative salary positions (well above/above/below/well below average) by position and country, normalizing to USD.

Model Selection and Hyperparameter Tuning

Challenge: Needed to identify the best-performing algorithm from multiple candidates while avoiding overfitting on a limited dataset of 469 employees.

Solution: Implemented stratified train-test split to maintain class distribution. Applied GridSearchCV with cross-validation for hyperparameter tuning across 7 algorithms. Extra-Trees Classifier achieved best ROC-AUC score with optimized parameters (n_estimators, criterion, bootstrap).

Actionable Business Insights

Challenge: Model predictions needed to translate into specific, actionable retention strategies for HR teams, not just probability scores.

Solution: Analyzed feature importances to identify key churn indicators: zero projects, high performance reviews, low tenure (<3 months), contract expiration timing, and salary positioning. Created 4-tier risk categorization (Safe <20%, Low 20-60%, Medium 60-90%, High ≥90%) with specific retention recommendations for each tier.

Project Information

Timeline

Apr 2022 - Jul 2022

Role

Data Scientist / ML Engineer

Project Metrics

469

Dataset Size

7

Models Evaluated

4

Risk Categories

Extra-Trees

Best Model