A collection of two machine learning projects implementing Decision Trees for both Classification and Regression tasks using scikit-learn — covering the Iris flower dataset and the Diabetes progression dataset.
Decision-Tree-Implementation/
├── decision_tree_classification_iris/
│ ├── Iris_DecisionTreeClassifier.ipynb
│ ├── decision_tree.png
│ ├── iris_pairplot.png
│ └── iris-machinelearning.png
│
└── decision_tree_regression_diabetes/
├── Diabetes_DicisionTreeRegression.ipynb
└── diabetes_tree.png
scikit-learnpandasnumpymatplotlibseaborn
A Decision Tree Classifier that classifies Iris flowers into three species based on sepal and petal measurements.
The Iris Dataset is a classic multiclass classification dataset from scikit-learn.
- Samples: 150
- Features: 4 (sepal length, sepal width, petal length, petal width)
- Target Classes: 3 — Iris Setosa, Iris Versicolor, Iris Virginica
Pairplot showing feature relationships across all three species:
- Load the Iris dataset from
sklearn.datasets - Create a DataFrame with feature columns
- Split into independent (X) and dependent (y) features
- Train-Test Split — 80% train / 20% test (
random_state=10) - Train a baseline
DecisionTreeClassifier - Hyperparameter tuning using
GridSearchCVwith 5-fold cross-validation - Evaluate using Confusion Matrix and Classification Report
- Predict on new flower samples
param = {
'criterion' : ['gini', 'entropy', 'log_loss'],
'splitter' : ['best', 'random'],
'max_depth' : [1, 2, 3, 4, 5],
'max_features': ['auto', 'sqrt', 'log2']
}Scoring metric: accuracy
# Single flower prediction
single_sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = grid.predict(single_sample)
# Output: Iris Setosa
# Multiple flower predictions
new_samples = [
[5.1, 3.5, 1.4, 0.2], # Setosa
[6.0, 2.9, 4.5, 1.5], # Versicolor
[6.7, 3.1, 5.6, 2.4], # Virginica
]A Decision Tree Regressor that predicts the progression of diabetes one year after baseline using 10 medical features.
The Diabetes Dataset is a regression dataset from scikit-learn.
- Samples: 442
- Features: 10 (age, sex, bmi, bp, s1, s2, s3, s4, s5, s6)
- Target: Quantitative measure of disease progression one year after baseline
Note: All feature variables are mean-centered and scaled.
- Load the Diabetes dataset from
sklearn.datasets - Create a DataFrame with all 10 feature columns
- Split into independent (X) and dependent (y) features
- Train-Test Split — 70% train / 30% test (
random_state=10) - Correlation heatmap to explore feature relationships
- Train a baseline
DecisionTreeRegressor - Hyperparameter tuning using
GridSearchCVwith 5-fold cross-validation - Evaluate using R2 Score, MAE, and MSE
- Build a selected model using best params from GridSearch
- Predict disease progression for a new patient
param = {
'criterion' : ['squared_error', 'friedman_mse', 'absolute_error'],
'splitter' : ['best', 'random'],
'max_depth' : [1, 2, 3, 4, 5, 10, 15, 20, 25],
'max_features': ['auto', 'sqrt', 'log2']
}Scoring metric: neg_mean_squared_error
# Predicting diabetes progression for a new patient using both models
new_patient = pd.DataFrame(
[[0.05, 0.05, 0.06, 0.02, -0.04, -0.03, -0.04, -0.002, 0.02, -0.01]],
columns=['age','sex','bmi','bp','s1','s2','s3','s4','s5','s6']
)
# GridSearch Model: disease progression score is 261.4 which indicates high progression
print("GridSearch Model Prediction :", round(grid_pred[0], 2))
# Selected Model: disease progression score is 179.48 which indicates moderate progression
print("Selected Model Prediction :", round(selected_pred[0], 2))Disease progression score range: ~25 (low) → ~346 (high)
| Metric | Description |
|---|---|
| R2 Score | How well the model explains variance in the target |
| MAE | Mean Absolute Error — average prediction error |
| MSE | Mean Squared Error — penalizes larger errors more |
- Both projects use built-in sklearn datasets (Iris & Diabetes)
I'm on my machine learning journey — building, experimenting and documenting as I go. Every notebook here represents something I've genuinely tried to understand, not just run. 🚀
Thanks to Krish Naik Sir whose Udemy course has been a great resource throughout this learning journey.
"The best way to learn is to do."



