Skip to content

AnmolPatel20/Decision-Tree-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🌳 Decision-Tree-Implementation

A collection of two machine learning projects implementing Decision Trees for both Classification and Regression tasks using scikit-learn — covering the Iris flower dataset and the Diabetes progression dataset.


📁 Repository Structure

Decision-Tree-Implementation/
├── decision_tree_classification_iris/
│   ├── Iris_DecisionTreeClassifier.ipynb
│   ├── decision_tree.png
│   ├── iris_pairplot.png
│   └── iris-machinelearning.png
│
└── decision_tree_regression_diabetes/
    ├── Diabetes_DicisionTreeRegression.ipynb
    └── diabetes_tree.png

🛠️ Libraries Used

  • scikit-learn
  • pandas
  • numpy
  • matplotlib
  • seaborn

🌸 Project 1 — Iris Flower Classification

A Decision Tree Classifier that classifies Iris flowers into three species based on sepal and petal measurements.

Iris Flower

📊 Dataset

The Iris Dataset is a classic multiclass classification dataset from scikit-learn.

  • Samples: 150
  • Features: 4 (sepal length, sepal width, petal length, petal width)
  • Target Classes: 3 — Iris Setosa, Iris Versicolor, Iris Virginica

📈 Exploratory Data Analysis

Pairplot showing feature relationships across all three species:

Pairplot

⚙️ Workflow

  1. Load the Iris dataset from sklearn.datasets
  2. Create a DataFrame with feature columns
  3. Split into independent (X) and dependent (y) features
  4. Train-Test Split — 80% train / 20% test (random_state=10)
  5. Train a baseline DecisionTreeClassifier
  6. Hyperparameter tuning using GridSearchCV with 5-fold cross-validation
  7. Evaluate using Confusion Matrix and Classification Report
  8. Predict on new flower samples

🔧 Hyperparameter Tuning

param = {
    'criterion'   : ['gini', 'entropy', 'log_loss'],
    'splitter'    : ['best', 'random'],
    'max_depth'   : [1, 2, 3, 4, 5],
    'max_features': ['auto', 'sqrt', 'log2']
}

Scoring metric: accuracy

🌳 Decision Tree Visualization

Iris Decision Tree

🧪 Sample Prediction

# Single flower prediction
single_sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = grid.predict(single_sample)
# Output: Iris Setosa

# Multiple flower predictions
new_samples = [
    [5.1, 3.5, 1.4, 0.2],   # Setosa
    [6.0, 2.9, 4.5, 1.5],   # Versicolor
    [6.7, 3.1, 5.6, 2.4],   # Virginica
]

🩺 Project 2 — Diabetes Progression Prediction

A Decision Tree Regressor that predicts the progression of diabetes one year after baseline using 10 medical features.

📊 Dataset

The Diabetes Dataset is a regression dataset from scikit-learn.

  • Samples: 442
  • Features: 10 (age, sex, bmi, bp, s1, s2, s3, s4, s5, s6)
  • Target: Quantitative measure of disease progression one year after baseline

Note: All feature variables are mean-centered and scaled.

⚙️ Workflow

  1. Load the Diabetes dataset from sklearn.datasets
  2. Create a DataFrame with all 10 feature columns
  3. Split into independent (X) and dependent (y) features
  4. Train-Test Split — 70% train / 30% test (random_state=10)
  5. Correlation heatmap to explore feature relationships
  6. Train a baseline DecisionTreeRegressor
  7. Hyperparameter tuning using GridSearchCV with 5-fold cross-validation
  8. Evaluate using R2 Score, MAE, and MSE
  9. Build a selected model using best params from GridSearch
  10. Predict disease progression for a new patient

🔧 Hyperparameter Tuning

param = {
    'criterion'   : ['squared_error', 'friedman_mse', 'absolute_error'],
    'splitter'    : ['best', 'random'],
    'max_depth'   : [1, 2, 3, 4, 5, 10, 15, 20, 25],
    'max_features': ['auto', 'sqrt', 'log2']
}

Scoring metric: neg_mean_squared_error

🌳 Decision Tree Visualization

Diabetes Decision Tree

🧪 Sample Prediction

# Predicting diabetes progression for a new patient using both models
new_patient = pd.DataFrame(
    [[0.05, 0.05, 0.06, 0.02, -0.04, -0.03, -0.04, -0.002, 0.02, -0.01]],
    columns=['age','sex','bmi','bp','s1','s2','s3','s4','s5','s6']
)

# GridSearch Model: disease progression score is 261.4 which indicates high progression
print("GridSearch Model Prediction  :", round(grid_pred[0], 2))

# Selected Model: disease progression score is 179.48 which indicates moderate progression
print("Selected Model Prediction    :", round(selected_pred[0], 2))

Disease progression score range: ~25 (low) → ~346 (high)

📉 Evaluation Metrics

Metric Description
R2 Score How well the model explains variance in the target
MAE Mean Absolute Error — average prediction error
MSE Mean Squared Error — penalizes larger errors more

📌 Notes

  • Both projects use built-in sklearn datasets (Iris & Diabetes)

🙋 About

I'm on my machine learning journey — building, experimenting and documenting as I go. Every notebook here represents something I've genuinely tried to understand, not just run. 🚀

🙏 Acknowledgements

Thanks to Krish Naik Sir whose Udemy course has been a great resource throughout this learning journey.

"The best way to learn is to do."


Releases

No releases published

Packages

 
 
 

Contributors