This repository functions as a monorepo containing source code, experiments, and visualizations for my Bachelor's degree module on Data Mining and Machine Learning.
It documents my progress in understanding the mathematical and logical foundations of standard classification and clustering algorithms.
Weka vs. Custom Implementation
The official curriculum for this module utilizes the Weka workbench (Waikato Environment for Knowledge Analysis) for practical exercises. While Weka is an excellent tool for rapid prototyping and applying existing models, it abstracts away the internal logic of the algorithms.
To ensure a comprehensive understanding of the material, I am mirroring the course exercises by implementing the algorithms from scratch (primarily in Python). This approach allows me to:
- Debug the mathematical steps (e.g., Entropy and Information Gain calculations).
- Understand the specific limitations and edge cases of each model.
A decision tree algorithm implemented to handle categorical data.
- Key Concepts: Entropy, Information Gain.
- Current Status: Implements recursive tree building. Handles discrete attributes.
- Language: Python 3
- Comparison Tool: Weka 3.8 (used for benchmarking results)