Skip to content

Cristopher-Morales/Machine_Learning_Notebooks

Repository files navigation

Machine Learning Internship Project

This repository contains Python notebooks developed during my internship, showcasing the application of machine learning techniques to real-world datasets.

The work covers the full data-driven workflow, from data cleaning and preprocessing to clustering, outlier detection, and predictive modelling. In particular, the repository demonstrates the use of classical Artificial Neural Networks (ANNs) for prediction tasks and compares their performance with Gaussian Process (GP) regression, which additionally provides uncertainty quantification through confidence intervals.

Repository Contents

Data Preprocessing and Analysis

  • cleaning_data.ipynb
    Data cleaning and preprocessing routines, including handling missing values and data normalisation.

  • detection_outliers.ipynb
    Methods for detecting and analysing outliers in the dataset.

  • clustering_select_data.ipynb
    Clustering techniques used to explore the dataset structure and to select representative data samples.

  • data_to_csv.ipynb
    Utilities for converting raw data into .csv format for further analysis and modelling.

Machine Learning Applications

  • ANN_Application.ipynb
    Application of classical Artificial Neural Networks for regression and prediction tasks.

  • GP_Application.ipynb
    Application of Gaussian Process regression, including comparison with ANN predictions and analysis of prediction uncertainty.

Documentation

  • README.md
    Repository overview and usage description.

  • LICENSE
    GNU General Public License v3.0 (GPL-3.0).

Key Features

  • End-to-end machine learning workflow
  • Comparison between ANN and GP regression models
  • Uncertainty quantification using Gaussian Processes
  • Practical examples using real datasets

Notes

  • The notebooks are intended for educational and demonstration purposes.
  • Some notebooks may require adapting file paths or hyperparameters depending on the dataset used.
  • The code is written to prioritise clarity and reproducibility.

About

This repository contains the Python scripts developed during my internship.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published