Air Quality Prediction - ML/DL Comparison
Master's thesis comparing machine learning and deep learning techniques for air quality forecasting using SARIMAX, DeepAR, LSTM, and Neural Prophet.
Overview
Master's degree thesis project for Artificial Intelligence at Universidad Internacional de La Rioja (UNIR). Comparative study of machine learning and deep learning models to predict ozone concentration using the London Average Air Quality Levels (LAAQL) dataset from King's College London.
Technologies & Tools
Model Results & Visualizations

Correlation matrix heatmap of air quality variables including O3, NO2, PM2.5, and PM10

SARIMAX model predictions compared with actual ozone concentration values

LSTM deep learning model predictions for ozone concentration forecasting

DeepAR probabilistic forecasting model with uncertainty quantification
Additional Resources

Ground-level ozone (O3) concentration time series from London LAAQL dataset

Statistical diagnostic plots for SARIMAX model validation
Key Features
- 1Comparative analysis of 4 forecasting algorithms: SARIMAX, DeepAR, LSTM, and Neural Prophet
- 2Time series prediction of ground-level ozone (O3) concentration
- 3Performance evaluation using MSE, MAE, RMSE, and MAPE metrics
- 4Data preprocessing and feature engineering for air quality data
- 5Visualization of prediction results and model comparison
- 6Statistical analysis of model performance across different time horizons
Challenges & Solutions
Model Selection and Comparison
Challenge: Needed to identify the most suitable algorithms for time series forecasting of air quality data and establish fair comparison criteria.
Solution: Selected 4 distinct approaches (statistical: SARIMAX, probabilistic: DeepAR, deep learning: LSTM and Neural Prophet) and evaluated them using standardized regression metrics (MSE, MAE, RMSE, MAPE) on the same dataset.
Time Series Data Preprocessing
Challenge: Air quality data contained missing values, outliers, and required proper temporal feature engineering for accurate predictions.
Solution: Implemented comprehensive data cleaning pipeline with interpolation for missing values, outlier detection, and temporal feature extraction. Created proper train/validation/test splits respecting temporal order.
Project Information
Timeline
Started: Sep 2022
Last updated: Mar 2023
Role
Researcher / Developer
Project Metrics
3.29%
Best RMSE
9.55%
Best MAPE
4
Models Compared