Professional
AI/ML
Completed

WebCreek Employee Churn Prediction

Machine Learning HR Analytics solution predicting employee turnover probability, identifying key churn indicators, and providing actionable retention strategies.

Overview

Supervised binary classification project predicting employee turnover at WebCreek, an outsourcing company based in the USA. Analyzes multiple variables from employee datasets and compares salary data by role and country to identify employees at risk of leaving. The Extra-Trees Classifier model achieved the highest performance with comprehensive feature engineering including tenure, contract expiration, performance reviews, and salary comparisons to market rates.

Technologies & Tools

Python
Scikit-learn
Pandas
NumPy
Matplotlib
Seaborn
Flask
Extra-Trees Classifier
Category Encoders
GridSearchCV

Model Results & Visualizations

ROC curves comparison of 7 ML algorithms for employee churn prediction

ROC-AUC comparison: Extra-Trees (0.88) outperformed all 7 evaluated algorithms

Feature importance chart showing top predictors for employee churn

Top 3 predictors: Department (10.6%), Contract Expiration (9%), Business Unit (8.4%)

Confusion matrix for Extra-Trees classifier showing prediction accuracy

Extra-Trees model: 67 correct predictions out of 83 test samples (80.7% accuracy)

Additional Resources

Correlation heatmap showing relationships between employee features

Feature correlations: Salary-Age (0.41), Months-Projects (0.44)

Key Features

  • 1Binary classification model predicting employee churn with Extra-Trees Classifier
  • 2Comprehensive feature engineering: tenure, contract expiration, performance reviews, salary comparison
  • 3Comparison of 7 ML algorithms: Random Forest, SVC, Neural Network, KNN, Gradient Boosting, AdaBoost, Extra-Trees
  • 4Flask REST API for real-time predictions with risk categorization (Safe, Low, Medium, High)
  • 5Integration with SalaryExpert data for market rate comparisons across positions and countries
  • 6Model evaluation using ROC-AUC, confusion matrix, precision, recall, and F1-score

Challenges & Solutions

Feature Engineering for HR Analytics

Challenge: Raw employee data needed transformation into meaningful predictive features. Salary data was in various currencies and positions varied by country.

Solution: Created engineered features: months at company, remaining contract months, months after last review, intern flag. Integrated SalaryExpert data to calculate relative salary positions (well above/above/below/well below average) by position and country, normalizing to USD.

Model Selection and Hyperparameter Tuning

Challenge: Needed to identify the best-performing algorithm from multiple candidates while avoiding overfitting on a limited dataset of 469 employees.

Solution: Implemented stratified train-test split to maintain class distribution. Applied GridSearchCV with cross-validation for hyperparameter tuning across 7 algorithms. Extra-Trees Classifier achieved best ROC-AUC score with optimized parameters (n_estimators, criterion, bootstrap).

Actionable Business Insights

Challenge: Model predictions needed to translate into specific, actionable retention strategies for HR teams, not just probability scores.

Solution: Analyzed feature importances to identify key churn indicators: zero projects, high performance reviews, low tenure (<3 months), contract expiration timing, and salary positioning. Created 4-tier risk categorization (Safe <20%, Low 20-60%, Medium 60-90%, High ≥90%) with specific retention recommendations for each tier.

Project Information

Timeline

Started: Apr 2022

Last updated: Jul 2022

Role

Data Scientist / ML Engineer

Project Metrics

469

Dataset Size

7

Models Evaluated

4

Risk Categories

Extra-Trees

Best Model