Open Source
Academic
AI/ML
Completed

Air Quality Prediction - ML/DL Comparison

Master's thesis comparing machine learning and deep learning techniques for air quality forecasting using SARIMAX, DeepAR, LSTM, and Neural Prophet.

Overview

Master's degree thesis project for Artificial Intelligence at Universidad Internacional de La Rioja (UNIR). Comparative study of machine learning and deep learning models to predict ozone concentration using the London Average Air Quality Levels (LAAQL) dataset from King's College London.

Technologies & Tools

Python
SARIMAX
DeepAR
LSTM
Neural Prophet
Pandas
NumPy
Scikit-learn
TensorFlow
GluonTS
Optuna
Keras
Time Series Analysis

Model Results & Visualizations

Feature correlation heatmap showing relationships between air quality variables

Correlation matrix heatmap of air quality variables including O3, NO2, PM2.5, and PM10

SARIMAX model predictions vs actual O3 values

SARIMAX model predictions compared with actual ozone concentration values

LSTM neural network predictions vs actual O3 values

LSTM deep learning model predictions for ozone concentration forecasting

DeepAR probabilistic predictions with confidence intervals

DeepAR probabilistic forecasting model with uncertainty quantification

Additional Resources

Original O3 concentration time series from London air quality dataset

Ground-level ozone (O3) concentration time series from London LAAQL dataset

SARIMAX model diagnostic plots including residuals and Q-Q plot

Statistical diagnostic plots for SARIMAX model validation

Key Features

  • 1Comparative analysis of 4 forecasting algorithms: SARIMAX, DeepAR, LSTM, and Neural Prophet
  • 2Time series prediction of ground-level ozone (O3) concentration
  • 3Performance evaluation using MSE, MAE, RMSE, and MAPE metrics
  • 4Data preprocessing and feature engineering for air quality data
  • 5Visualization of prediction results and model comparison
  • 6Statistical analysis of model performance across different time horizons

Challenges & Solutions

Model Selection and Comparison

Challenge: Needed to identify the most suitable algorithms for time series forecasting of air quality data and establish fair comparison criteria.

Solution: Selected 4 distinct approaches (statistical: SARIMAX, probabilistic: DeepAR, deep learning: LSTM and Neural Prophet) and evaluated them using standardized regression metrics (MSE, MAE, RMSE, MAPE) on the same dataset.

Time Series Data Preprocessing

Challenge: Air quality data contained missing values, outliers, and required proper temporal feature engineering for accurate predictions.

Solution: Implemented comprehensive data cleaning pipeline with interpolation for missing values, outlier detection, and temporal feature extraction. Created proper train/validation/test splits respecting temporal order.

Project Information

Timeline

Started: Sep 2022

Last updated: Mar 2023

Role

Researcher / Developer

Project Metrics

3.29%

Best RMSE

9.55%

Best MAPE

4

Models Compared

External Links