Open Source

Academic

AI/ML

Completed

Air Quality Prediction - ML/DL Comparison

Master's thesis comparing machine learning and deep learning techniques for air quality forecasting using SARIMAX, DeepAR, LSTM, and Neural Prophet.

View on GitHub

Overview

Master's degree thesis project for Artificial Intelligence at Universidad Internacional de La Rioja (UNIR). Comparative study of machine learning and deep learning models to predict ozone concentration using the London Average Air Quality Levels (LAAQL) dataset from King's College London.

Technologies & Tools

Python

SARIMAX

DeepAR

LSTM

Neural Prophet

Pandas

NumPy

Scikit-learn

TensorFlow

GluonTS

Optuna

Keras

Time Series Analysis

Model Results & Visualizations

Feature correlation heatmap showing relationships between air quality variables

Correlation matrix heatmap of air quality variables including O3, NO2, PM2.5, and PM10

SARIMAX model predictions vs actual O3 values

SARIMAX model predictions compared with actual ozone concentration values

LSTM neural network predictions vs actual O3 values

LSTM deep learning model predictions for ozone concentration forecasting

DeepAR probabilistic predictions with confidence intervals

DeepAR probabilistic forecasting model with uncertainty quantification

Additional Resources

Original O3 concentration time series from London air quality dataset

Ground-level ozone (O3) concentration time series from London LAAQL dataset

SARIMAX model diagnostic plots including residuals and Q-Q plot

Statistical diagnostic plots for SARIMAX model validation

Key Features

1Comparative analysis of 4 forecasting algorithms: SARIMAX, DeepAR, LSTM, and Neural Prophet
2Time series prediction of ground-level ozone (O3) concentration
3Performance evaluation using MSE, MAE, RMSE, and MAPE metrics
4Data preprocessing and feature engineering for air quality data
5Visualization of prediction results and model comparison
6Statistical analysis of model performance across different time horizons

Challenges & Solutions

Model Selection and Comparison

Challenge: Needed to identify the most suitable algorithms for time series forecasting of air quality data and establish fair comparison criteria.

Solution: Selected 4 distinct approaches (statistical: SARIMAX, probabilistic: DeepAR, deep learning: LSTM and Neural Prophet) and evaluated them using standardized regression metrics (MSE, MAE, RMSE, MAPE) on the same dataset.

Time Series Data Preprocessing

Challenge: Air quality data contained missing values, outliers, and required proper temporal feature engineering for accurate predictions.

Solution: Implemented comprehensive data cleaning pipeline with interpolation for missing values, outlier detection, and temporal feature extraction. Created proper train/validation/test splits respecting temporal order.

Project Information

Timeline

Started: Sep 2022

Last updated: Mar 2023

Role

Researcher / Developer

Project Metrics

3.29%

Best RMSE

9.55%

Best MAPE

Models Compared

External Links

View on GitHub