WebCreek Employee Churn Prediction
Machine Learning HR Analytics solution predicting employee turnover probability, identifying key churn indicators, and providing actionable retention strategies.
Overview
Supervised binary classification project predicting employee turnover at WebCreek, an outsourcing company based in the USA. Analyzes multiple variables from employee datasets and compares salary data by role and country to identify employees at risk of leaving. The Extra-Trees Classifier model achieved the highest performance with comprehensive feature engineering including tenure, contract expiration, performance reviews, and salary comparisons to market rates.
Technologies & Tools
Key Features
- 1Binary classification model predicting employee churn with Extra-Trees Classifier
- 2Comprehensive feature engineering: tenure, contract expiration, performance reviews, salary comparison
- 3Comparison of 7 ML algorithms: Random Forest, SVC, Neural Network, KNN, Gradient Boosting, AdaBoost, Extra-Trees
- 4Flask REST API for real-time predictions with risk categorization (Safe, Low, Medium, High)
- 5Integration with SalaryExpert data for market rate comparisons across positions and countries
- 6Model evaluation using ROC-AUC, confusion matrix, precision, recall, and F1-score
- 7Feature importance analysis identifying Department, Contract Expiration, and Business Unit as top predictors
- 8Automated risk scoring system with actionable retention recommendations
Challenges & Solutions
Feature Engineering for HR Analytics
Challenge: Raw employee data needed transformation into meaningful predictive features. Salary data was in various currencies and positions varied by country.
Solution: Created engineered features: months at company, remaining contract months, months after last review, intern flag. Integrated SalaryExpert data to calculate relative salary positions (well above/above/below/well below average) by position and country, normalizing to USD.
Model Selection and Hyperparameter Tuning
Challenge: Needed to identify the best-performing algorithm from multiple candidates while avoiding overfitting on a limited dataset of 469 employees.
Solution: Implemented stratified train-test split to maintain class distribution. Applied GridSearchCV with cross-validation for hyperparameter tuning across 7 algorithms. Extra-Trees Classifier achieved best ROC-AUC score with optimized parameters (n_estimators, criterion, bootstrap).
Actionable Business Insights
Challenge: Model predictions needed to translate into specific, actionable retention strategies for HR teams, not just probability scores.
Solution: Analyzed feature importances to identify key churn indicators: zero projects, high performance reviews, low tenure (<3 months), contract expiration timing, and salary positioning. Created 4-tier risk categorization (Safe <20%, Low 20-60%, Medium 60-90%, High ≥90%) with specific retention recommendations for each tier.
Project Information
Timeline
Apr 2022 - Jul 2022
Role
Data Scientist / ML Engineer
Project Metrics
469
Dataset Size
7
Models Evaluated
4
Risk Categories
Extra-Trees
Best Model