Task 5: Decision Trees and Random Forests

This repository contains my solution for Task 5 of the AI & ML Internship. The focus of this task is to explore tree-based models—specifically Decision Trees and Random Forests—using the Heart Disease dataset. The goal is to understand model interpretability, overfitting control, ensemble methods, and feature importance.

Objective

Learn tree-based models for classification & regression.

Files Included

File Name	Description
`heart.csv`	Dataset used for classification
`tree_based_models.ipynb`	Jupyter Notebook with all steps: training, tuning, visualization
`screenshots/`	Folder containing plots of trees and feature importances
`README.md`	Project documentation

What I Did

Data Exploration
Performed basic EDA using pandas to understand structure, check for null values, and examine the target distribution.
Train/Test Split
Divided data into features (X) and labels (y), with an 80/20 split for training and testing.
Decision Tree Classifier
- Trained a base decision tree using DecisionTreeClassifier.
- Visualized the tree using plot_tree to interpret splits and predictions.
Controlling Overfitting
- Tuned max_depth to limit the depth of the tree and avoid overfitting.
- Compared train and test accuracy across different depths.
Random Forest Classifier
- Trained a RandomForestClassifier to improve accuracy and reduce overfitting.
- Compared performance with the decision tree model.
Feature Importance
- Extracted feature importances using .feature_importances_
- Visualized top contributing features using bar plots.
Model Evaluation
- Used accuracy scores, confusion matrices, and cross-validation (cross_val_score) to assess model performance and robustness.

Tools & Libraries Used

Python 3.12
pandas, numpy — data handling
matplotlib, seaborn — plotting
scikit-learn — model building and evaluation

What I Learned

This task deepened my understanding of tree-based models. I learned:

How decision trees make splits based on feature thresholds.
How to visualize trees for better interpretability.
Why random forests (via bagging) outperform single trees.
How feature importance can guide insights in medical data.
The trade-off between bias and variance when tuning tree depth.

How to Run This Project

Clone the repository:

git clone https://github.com/anmolthakur74/task-5-tree-models.git
cd task-5-tree-models


2. Install dependencies:


```bash
pip install pandas numpy matplotlib seaborn scikit-learn

Open the notebook:

jupyter notebook tree_based_models.ipynb

Author

Anmol Thakur

GitHub: anmolthakur74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Task 5: Decision Trees and Random Forests

Objective

Files Included

What I Did

Tools & Libraries Used

What I Learned

How to Run This Project

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
screenshots		screenshots
README.md		README.md
heart.csv		heart.csv
tree_based_models.ipynb		tree_based_models.ipynb

anmolthakur74/task-5-tree-based-models

Folders and files

Latest commit

History

Repository files navigation

Task 5: Decision Trees and Random Forests

Objective

Files Included

What I Did

Tools & Libraries Used

What I Learned

How to Run This Project

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages