This repository contains Python notebooks developed during my internship, showcasing the application of machine learning techniques to real-world datasets.
The work covers the full data-driven workflow, from data cleaning and preprocessing to clustering, outlier detection, and predictive modelling. In particular, the repository demonstrates the use of classical Artificial Neural Networks (ANNs) for prediction tasks and compares their performance with Gaussian Process (GP) regression, which additionally provides uncertainty quantification through confidence intervals.
-
cleaning_data.ipynb
Data cleaning and preprocessing routines, including handling missing values and data normalisation. -
detection_outliers.ipynb
Methods for detecting and analysing outliers in the dataset. -
clustering_select_data.ipynb
Clustering techniques used to explore the dataset structure and to select representative data samples. -
data_to_csv.ipynb
Utilities for converting raw data into.csvformat for further analysis and modelling.
-
ANN_Application.ipynb
Application of classical Artificial Neural Networks for regression and prediction tasks. -
GP_Application.ipynb
Application of Gaussian Process regression, including comparison with ANN predictions and analysis of prediction uncertainty.
-
README.md
Repository overview and usage description. -
LICENSE
GNU General Public License v3.0 (GPL-3.0).
- End-to-end machine learning workflow
- Comparison between ANN and GP regression models
- Uncertainty quantification using Gaussian Processes
- Practical examples using real datasets
- The notebooks are intended for educational and demonstration purposes.
- Some notebooks may require adapting file paths or hyperparameters depending on the dataset used.
- The code is written to prioritise clarity and reproducibility.