Featured Project
Open Source
Side Project
AI/ML
Completed

AWS AutoML Lite - Serverless AutoML Platform

Cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types (classification/regression), and train machine learning models with FLAML. ~85-95% cheaper than SageMaker Autopilot.

Overview

A lightweight, production-ready AutoML platform leveraging AWS serverless services. Features split architecture: FastAPI + Mangum for Lambda API (5MB) and Docker containers on AWS Batch Fargate Spot for training (265MB ML dependencies). Built with Python 3.11+, Next.js 16, and Terraform 1.9+. Implements automatic problem type detection, EDA report generation with ydata-profiling, intelligent feature preprocessing (ID column detection, constant/duplicate removal), and model export (.pkl). Infrastructure as Code with Terraform managing 44+ AWS resources. Frontend with SSR deployed on AWS Amplify (Node.js 20+). Achieves ~$10-25/month cost (vs $50-200 SageMaker Autopilot) with complete CI/CD via GitHub Actions OIDC.

Technologies & Tools

AWS Lambda
AWS Batch
AWS Fargate Spot
AWS Amplify
AWS S3
AWS DynamoDB
AWS API Gateway
AWS ECR
FastAPI
Mangum
Python
FLAML
scikit-learn
XGBoost
LightGBM
ydata-profiling
feature-engine
Recharts
Next.js
Node.js
TypeScript
React
Tailwind CSS
Terraform
Docker
GitHub Actions
Pydantic

Architecture & System Design

AWS AutoML Lite main architecture diagram showing serverless components and data flow

Main architecture: Split design with Lambda API (5MB) and AWS Batch containers for ML training (265MB)

Data flow architecture showing S3, DynamoDB, Lambda, and Batch interactions

Data flow: CSV upload → S3 → Lambda preprocessing → Batch training → S3 model storage

Training architecture with AWS Batch Fargate Spot and FLAML AutoML

Training pipeline: AWS Batch Fargate Spot with Docker containers running FLAML AutoML

CI/CD pipeline architecture with GitHub Actions and OIDC authentication

CI/CD: GitHub Actions with OIDC for secure AWS authentication and Terraform deployments

Performance Metrics

Model results page displaying performance metrics including accuracy, F1 score, precision, and recall

Model evaluation metrics: Accuracy, F1-Score, Precision, Recall for classification models

Application Screenshots

AWS AutoML Lite configuration page showing target column selection with automatic problem type detection

Configuration UI with column selection, auto problem type detection (Classification/Regression), and excluded columns

Training progress page showing AWS Batch job status and real-time updates

Real-time training progress with AWS Batch job status tracking

Results page showing download buttons for model, EDA report, training report, and Docker usage instructions

Download section with model (.pkl), reports (.html), and Docker-based prediction commands

Additional Resources

ydata-profiling EDA report showing dataset overview and data quality alerts

Automated EDA report generated with ydata-profiling showing dataset overview and alerts

FLAML training report showing best model summary and performance metrics

Training report with best model summary, metrics, and hyperparameters

Key Features

  • 1Split architecture: Lambda for API (5MB, fast cold starts) and AWS Batch containers for ML training (265MB dependencies, >15min jobs)
  • 2Automatic problem type detection: classification (<20 unique values or <5% unique ratio) or regression
  • 3Intelligent feature preprocessing with feature-engine: auto-detects ID columns via regex patterns, removes constant/duplicate features
  • 4FLAML AutoML with LightGBM, Random Forest, and Extra Trees estimators for efficient hyperparameter tuning
  • 5Automated EDA report generation with ydata-profiling, stored in S3 with presigned URL access
  • 6AWS Batch Fargate Spot for 70% cost savings on training jobs (92% cheaper than Lambda for ML workloads)

Challenges & Solutions

Lambda Package Size Limits for ML Dependencies

Challenge: Initial attempts to deploy ML training code directly in Lambda failed due to the 250MB deployment package limit. ML dependencies (FLAML, scikit-learn, xgboost, lightgbm) totaled ~265MB, exceeding Lambda limits.

Solution: Implemented split architecture: Lambda handles API requests (5MB compressed), AWS Batch runs training in Docker containers (no size limit). This also resolved the 15-minute timeout constraint for long-running training jobs (2-60 minutes). Cost analysis showed Batch Fargate Spot is 92% cheaper than Lambda for training workloads.

Low Model Accuracy Due to Irrelevant Features

Challenge: Initial training run showed only 35.98% accuracy. Investigation revealed the model was trying to learn from random identifiers (Order_ID, Customer_ID) that provided no predictive value.

Solution: Integrated feature-engine library with custom ID pattern detection using regex patterns (.*_?id$, ^id_?.*, etc.). Combined with constant feature detection (>98% same value) and duplicate feature removal. Result: Successfully auto-detected and excluded identifier columns, dramatically improving model accuracy.

Container Isolation and Environment Variable Cascade

Challenge: Training container initially tried to call the API for configuration, causing circular dependencies and network issues. The container needed to operate autonomously without API access.

Solution: Established environment variable cascade: Terraform (lambda.tf) → Lambda env vars → batch_service.py containerOverrides → train.py os.getenv(). Training container receives ALL context via environment variables and writes directly to DynamoDB/S3. Added validation in train.py to fail fast with clear error messages for missing variables.

Project Information

Timeline

Started: Nov 2025

Last updated: Dec 2025

Role

Full Stack Developer + AI/ML Engineer

Project Metrics

85-95%

Cost Savings vs SageMaker

$10-25

Monthly Cost

44+

AWS Resources Managed

265MB

Training Container Size

5MB

Lambda API Size

<2s

Lambda Cold Start

External Links