This project implements the ID3 (Iterative Dichotomiser 3) algorithm in Python for decision tree learning. The algorithm uses entropy and information gain as metrics to build a decision tree from a dataset. This implementation provides step-by-step calculation of the entropy and information gain for better verification and understanding.
- Input Data: The script takes a string input for column names and data rows.
- Preprocessing: The data is parsed into a DataFrame for easy manipulation and processing.
- Entropy Calculation:
- Calculates the overall entropy of the target variable.
- Calculates conditional entropy for each unique value of the input attributes.
- Information Gain:
- Uses the entropy values to calculate the information gain for each attribute.
- Outputs the formulas and intermediate steps for transparency.
- Decision Tree Logic: The attribute with the highest information gain is selected for the split.
- Clone this repository:
git clone https://github.com/ravesandstorm/ID3-Python-Stepwise.git
- Open the script in your Python environment.
- Modify the
columnsanddatavariables to input your dataset. - Run the script to view entropy and information gain calculations for all the data.
Given the following dataset:
| Weather | Parents | Cash | Exam | Decision |
|---|---|---|---|---|
| sunny | visit | rich | yes | cinema |
| windy | no-visit | rich | no | shopping |
The script calculates:
- Entropy of the dataset
- Conditional entropy for each attribute
- Information gain for each attribute
Sample output:
Entropy of Data = 0.9852 = -(6/11)*log(6/11) - (5/11)*log(5/11)
Entropy for 'sunny' = 0.9183 = -(4/6)*log(4/6) - (2/6)*log(2/6)
...
Info. Gain for 'Weather' = 0.1934 = 0.985 - [(6/11)*0.9183 + (5/11)*0.7222]
- Python 3.8+
- Pandas
- NumPy
- Python Math library
Feel free to contribute, report issues, or suggest improvements! 😊