Featured Project
Open Source
Side Project
AI/ML
Completed

AWS AutoML Lite - Serverless AutoML Platform

Cost-effective AutoML platform built on AWS serverless architecture. Upload CSV files, automatically detect problem types (classification/regression), train ML models with FLAML, and deploy for serverless inference via Lambda + ONNX Runtime. ~$3-25/month vs ~$36-171/month for SageMaker endpoints.

Overview

A lightweight, production-ready AutoML platform leveraging AWS serverless services. Features split architecture: FastAPI + Mangum for Lambda API (5MB) and Docker containers on AWS Batch Fargate Spot for training (265MB ML dependencies). Built with Python 3.11+, Next.js 16, and Terraform 1.9+. Implements automatic problem type detection, EDA report generation with ydata-profiling, intelligent feature preprocessing (ID column detection, constant/duplicate removal), serverless model inference with ONNX Runtime on Lambda ($0 idle vs ~$36-171/month SageMaker), model export (.pkl and .onnx), model comparison of up to 4 training runs, dark mode with system preference detection, and training run tags/notes for experiment organization. Infrastructure as Code with Terraform. Frontend with SSR deployed on AWS Amplify (Node.js 20+). Achieves ~$3-25/month cost ($0 when idle) with complete CI/CD via GitHub Actions OIDC.

Technologies & Tools

AWS Lambda
AWS Batch
AWS Fargate Spot
AWS Amplify
AWS S3
AWS DynamoDB
AWS API Gateway
AWS ECR
FastAPI
Mangum
Python
FLAML
scikit-learn
XGBoost
LightGBM
ONNX Runtime
ydata-profiling
feature-engine
Next.js
Node.js
TypeScript
React
Tailwind CSS
next-themes
Terraform
Docker
GitHub Actions
Pydantic

Architecture & System Design

AWS AutoML Lite main architecture diagram showing serverless components and data flow

Main architecture: Split design with Lambda API (5MB) and AWS Batch containers for ML training (265MB)

Data flow architecture showing S3, DynamoDB, Lambda, and Batch interactions

Data flow: CSV upload → S3 → Lambda preprocessing → Batch training → S3 model storage

Training architecture with AWS Batch Fargate Spot and FLAML AutoML

Training pipeline: AWS Batch Fargate Spot with Docker containers running FLAML AutoML

CI/CD pipeline architecture with GitHub Actions and OIDC authentication

CI/CD: GitHub Actions with OIDC for secure AWS authentication and Terraform deployments

Performance Metrics

Model results page displaying performance metrics including accuracy, F1 score, precision, and recall

Model evaluation metrics: Accuracy, F1-Score, Precision, Recall for classification (R², RMSE, MAE for regression)

Application Screenshots

AWS AutoML Lite configuration page showing target column selection with automatic problem type detection

Configuration UI with column selection, auto problem type detection (Classification/Regression), and excluded columns

Training progress page showing AWS Batch job status and real-time updates

Real-time training progress with AWS Batch job status tracking

Prediction Playground with serverless Lambda inference showing real-time predictions

Prediction Playground: Interactive form with serverless Lambda + ONNX inference, showing class prediction with probabilities

Download section with model files (.pkl and .onnx) and usage instructions

Download models (.pkl and .onnx formats) with Docker and Python usage instructions

Additional Resources

ydata-profiling EDA report showing dataset overview and data quality alerts

Automated EDA report generated with ydata-profiling showing dataset overview and alerts

FLAML training report showing best model summary and performance metrics

Training report with best model summary, metrics, and hyperparameters

Key Features

  • 1Serverless Model Inference: One-click deploy models for predictions via Lambda + ONNX Runtime ($0 idle vs ~$36-171/month SageMaker)
  • 2Prediction Playground: Interactive UI to test deployed models with real-time predictions and confidence scores
  • 3ONNX Model Export: Cross-platform model format (.onnx) alongside .pkl for portable deployment
  • 4Model Comparison: Side-by-side comparison of up to 4 training runs with metrics highlighting
  • 5Dark Mode: Full theme support with system preference detection and manual toggle
  • 6Training Tags & Notes: Organize and annotate experiments with custom tags (up to 10) and notes (up to 1000 chars)

Challenges & Solutions

Lambda Package Size Limits for ML Dependencies

Challenge: Initial attempts to deploy ML training code directly in Lambda failed due to the 250MB deployment package limit. ML dependencies (FLAML, scikit-learn, xgboost, lightgbm) totaled ~265MB, exceeding Lambda limits.

Solution: Implemented split architecture: Lambda handles API requests (5MB compressed), AWS Batch runs training in Docker containers (no size limit). This also resolved the 15-minute timeout constraint for long-running training jobs (2-60 minutes). Cost analysis showed Batch Fargate Spot is 92% cheaper than Lambda for training workloads.

Low Model Accuracy Due to Irrelevant Features

Challenge: Initial training run showed only 35.98% accuracy. Investigation revealed the model was trying to learn from random identifiers (Order_ID, Customer_ID) that provided no predictive value.

Solution: Integrated feature-engine library with custom ID pattern detection using regex patterns (.*_?id$, ^id_?.*, etc.). Combined with constant feature detection (>98% same value) and duplicate feature removal. Result: Successfully auto-detected and excluded identifier columns, dramatically improving model accuracy.

Container Isolation and Environment Variable Cascade

Challenge: Training container initially tried to call the API for configuration, causing circular dependencies and network issues. The container needed to operate autonomously without API access.

Solution: Established environment variable cascade: Terraform (lambda.tf) → Lambda env vars → batch_service.py containerOverrides → train.py os.getenv(). Training container receives ALL context via environment variables and writes directly to DynamoDB/S3. Added validation in train.py to fail fast with clear error messages for missing variables.

Project Information

Timeline

Started: Nov 2025

Last updated: Dec 2025

Role

Full Stack Developer + AI/ML Engineer

Project Metrics

~$3-25

Monthly Cost

$36-171/mo

SageMaker Equivalent

$0

Idle Cost

4

ML Algorithms

<2s

Lambda Cold Start

External Links