Ensemble Learning - Stacking Explained

Improving Predictions by Combining Models

This project explores stacking, a powerful ensemble learning technique that boosts model performance by combining multiple models. Instead of relying on a single model, stacking lets different models work together, with a final model (meta-learner) making the best possible prediction.

Here, I apply stacking to a house price prediction problem, but the concept is useful for many types of regression and classification tasks.

What is Stacking?

Stacking is an ensemble learning technique that uses multiple models to make predictions and then combines those predictions using another model. This final model—the meta-learner—figures out the best way to weigh and use the individual models' outputs.

How It Works

Train Base Models
- Train different models on the dataset (e.g., Linear Regression, Random Forest, Gradient Boosting).
Generate Predictions
- Each base model makes predictions on the dataset. These predictions are not used directly but instead serve as new inputs.
Train a Meta-Model
- A meta-model (like Ridge Regression) learns how to combine these predictions optimally.
Final Prediction
- The meta-model produces the final, improved prediction.

Here’s a simple visual representation of the stacking process:

                 Raw Data
                    │
                    ▼
┌──────────────────────────────────────────┐
│ Train Base Models (Linear, RF, XGB, LGB) │
└──────────────────────────────────────────┘
                    │
                    ▼
 ┌───────────────────────────────────────┐
 │ Base Model Predictions (New Features) │
 └───────────────────────────────────────┘                   
                    │
                    ▼
  ┌─────────────────────────────────────┐
  │ Train Meta-Model (Ridge Regression) │
  └─────────────────────────────────────┘
                    │
                    ▼
             Final Prediction

How Stacking Was Implemented in This Project

Step 1: Train Individual Models

The following models were used as base learners:

Linear Regression – Simple, interpretable baseline.
Random Forest Regressor – Captures non-linearity and interactions.
Gradient Boosting (XGBoost & LightGBM) – Handles complex patterns efficiently.

Step 2: Generate Out-of-Fold Predictions

I used cross-validation to generate out-of-fold predictions for each model. These predictions were then used as input features for the meta-learner.

Step 3: Train the Meta-Model

I used Ridge Regression as the meta-model because:

It balances the predictions well.
It prevents overfitting with regularization.
It works well with correlated inputs (which stacked predictions often are).

Step 4: Evaluate Performance

I compared the Root Mean Squared Error (RMSE) for:

Each individual model
The stacked model

The stacked model consistently outperformed the individual models, proving the strength of combining multiple learners.

Results

Model	RMSE (Lower is Better)
Linear Regression	27878.786
Random Forest	25708.871
XGBoost	23123.952
LightGBM	23398.139
GradientBoost	23123.952
Stacked Model	21906.231

While exact values depend on data splits and hyperparameters, the stacked model outperformed all individual models.

Why Stacking Works

Uses multiple perspectives – Linear models are good for interpretability, while tree-based models handle non-linearity. Stacking blends them.
Reduces bias & variance – A single model may overfit or underfit, but stacking balances these issues.
Learns optimal weights automatically – Unlike bagging/boosting, stacking lets the meta-model decide which base models matter most.

Notes

This project is mainly focused on showcasing how stacking works. The model can be improved by:

Better feature engineering – Creating new features to improve predictions.
Hyperparameter tuning – Using Optuna to find the best settings.
Using more powerful base models – Incorporating advanced models in the ensemble.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Ensemble_Learning(stacking).ipynb		Ensemble_Learning(stacking).ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensemble Learning - Stacking Explained

Improving Predictions by Combining Models

What is Stacking?

How It Works

How Stacking Was Implemented in This Project

Step 1: Train Individual Models

Step 2: Generate Out-of-Fold Predictions

Step 3: Train the Meta-Model

Step 4: Evaluate Performance

Results

Why Stacking Works

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ensemble Learning - Stacking Explained

Improving Predictions by Combining Models

What is Stacking?

How It Works

How Stacking Was Implemented in This Project

Step 1: Train Individual Models

Step 2: Generate Out-of-Fold Predictions

Step 3: Train the Meta-Model

Step 4: Evaluate Performance

Results

Why Stacking Works

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages