AWS AutoML Lite - Serverless AutoML Platform
Cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types (classification/regression), and train machine learning models with FLAML. ~85-95% cheaper than SageMaker Autopilot.
Overview
A lightweight, production-ready AutoML platform leveraging AWS serverless services. Features split architecture: FastAPI + Mangum for Lambda API (5MB) and Docker containers on AWS Batch Fargate Spot for training (265MB ML dependencies). Built with Python 3.11+, Next.js 16, and Terraform 1.9+. Implements automatic problem type detection, EDA report generation with ydata-profiling, intelligent feature preprocessing (ID column detection, constant/duplicate removal), and model export (.pkl). Infrastructure as Code with Terraform managing 44+ AWS resources. Frontend with SSR deployed on AWS Amplify (Node.js 20+). Achieves ~$10-25/month cost (vs $50-200 SageMaker Autopilot) with complete CI/CD via GitHub Actions OIDC.
Technologies & Tools
Architecture & System Design

Main architecture: Split design with Lambda API (5MB) and AWS Batch containers for ML training (265MB)

Data flow: CSV upload → S3 → Lambda preprocessing → Batch training → S3 model storage

Training pipeline: AWS Batch Fargate Spot with Docker containers running FLAML AutoML

CI/CD: GitHub Actions with OIDC for secure AWS authentication and Terraform deployments
Performance Metrics

Model evaluation metrics: Accuracy, F1-Score, Precision, Recall for classification models
Application Screenshots

Configuration UI with column selection, auto problem type detection (Classification/Regression), and excluded columns

Real-time training progress with AWS Batch job status tracking

Download section with model (.pkl), reports (.html), and Docker-based prediction commands
Additional Resources

Automated EDA report generated with ydata-profiling showing dataset overview and alerts

Training report with best model summary, metrics, and hyperparameters
Key Features
- 1Split architecture: Lambda for API (5MB, fast cold starts) and AWS Batch containers for ML training (265MB dependencies, >15min jobs)
- 2Automatic problem type detection: classification (<20 unique values or <5% unique ratio) or regression
- 3Intelligent feature preprocessing with feature-engine: auto-detects ID columns via regex patterns, removes constant/duplicate features
- 4FLAML AutoML with LightGBM, Random Forest, and Extra Trees estimators for efficient hyperparameter tuning
- 5Automated EDA report generation with ydata-profiling, stored in S3 with presigned URL access
- 6AWS Batch Fargate Spot for 70% cost savings on training jobs (92% cheaper than Lambda for ML workloads)
Challenges & Solutions
Lambda Package Size Limits for ML Dependencies
Challenge: Initial attempts to deploy ML training code directly in Lambda failed due to the 250MB deployment package limit. ML dependencies (FLAML, scikit-learn, xgboost, lightgbm) totaled ~265MB, exceeding Lambda limits.
Solution: Implemented split architecture: Lambda handles API requests (5MB compressed), AWS Batch runs training in Docker containers (no size limit). This also resolved the 15-minute timeout constraint for long-running training jobs (2-60 minutes). Cost analysis showed Batch Fargate Spot is 92% cheaper than Lambda for training workloads.
Low Model Accuracy Due to Irrelevant Features
Challenge: Initial training run showed only 35.98% accuracy. Investigation revealed the model was trying to learn from random identifiers (Order_ID, Customer_ID) that provided no predictive value.
Solution: Integrated feature-engine library with custom ID pattern detection using regex patterns (.*_?id$, ^id_?.*, etc.). Combined with constant feature detection (>98% same value) and duplicate feature removal. Result: Successfully auto-detected and excluded identifier columns, dramatically improving model accuracy.
Container Isolation and Environment Variable Cascade
Challenge: Training container initially tried to call the API for configuration, causing circular dependencies and network issues. The container needed to operate autonomously without API access.
Solution: Established environment variable cascade: Terraform (lambda.tf) → Lambda env vars → batch_service.py containerOverrides → train.py os.getenv(). Training container receives ALL context via environment variables and writes directly to DynamoDB/S3. Added validation in train.py to fail fast with clear error messages for missing variables.
Project Information
Timeline
Started: Nov 2025
Last updated: Dec 2025
Role
Full Stack Developer + AI/ML Engineer
Project Metrics
85-95%
Cost Savings vs SageMaker
$10-25
Monthly Cost
44+
AWS Resources Managed
265MB
Training Container Size
5MB
Lambda API Size
<2s
Lambda Cold Start