Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
369d0fd
Fix data leakage in housing data generation
DavidEiglspergerQC Jan 22, 2026
caa00a0
First implementation of additional benchmarks
DavidEiglspergerQC Jan 21, 2026
351e061
new benchmark adaption
DavidEiglspergerQC Jan 22, 2026
171e695
exclude statsmodels
DavidEiglspergerQC Jan 22, 2026
6d49048
adjust benchmarks
DavidEiglspergerQC Jan 26, 2026
71edc78
adapt parameters
DavidEiglspergerQC Jan 26, 2026
4cae856
Delete cv and adjust plotting
DavidEiglspergerQC Jan 26, 2026
42411b7
Resolve conflicts
DavidEiglspergerQC Jan 26, 2026
ac74924
Add scaling
DavidEiglspergerQC Jan 26, 2026
e31b96e
Scaling only for benchmarks, not for golden master tests
DavidEiglspergerQC Jan 27, 2026
bd09f9d
Closed form solution for l2-gaussian
DavidEiglspergerQC Jan 27, 2026
4abe075
Benchmark CLI replacement and separation of benchmarking
DavidEiglspergerQC Jan 28, 2026
c35bb1b
fix CI
DavidEiglspergerQC Jan 28, 2026
eee3877
Improved plotting and some small adaptions/fixes
DavidEiglspergerQC Jan 29, 2026
67bbb17
update comments and fix CI
DavidEiglspergerQC Jan 29, 2026
e11a106
Config in yaml, advanced plotting & storage=auto for glum
DavidEiglspergerQC Jan 30, 2026
3e2d7ac
Incorporate feedback and add some further functionalities
DavidEiglspergerQC Feb 2, 2026
de723ad
chore: retrigger CI
DavidEiglspergerQC Feb 2, 2026
b2119d3
change timeout logic
DavidEiglspergerQC Feb 3, 2026
7f169b3
Refinements & new features
DavidEiglspergerQC Feb 6, 2026
4fae115
Bug fixes, parameter tuning and scaling refinement
DavidEiglspergerQC Feb 9, 2026
0db1cb2
CI fix
DavidEiglspergerQC Feb 9, 2026
80d9ed2
Minor cleaning
DavidEiglspergerQC Feb 9, 2026
863e487
Allow flexible K/N ratio for the simulated problems
DavidEiglspergerQC Feb 10, 2026
423fb4f
adjust defaults and available distributions
DavidEiglspergerQC Feb 10, 2026
68212f7
k_over_n as entry in param grid and rich for tables
DavidEiglspergerQC Feb 11, 2026
5a74424
small adjustments
DavidEiglspergerQC Feb 11, 2026
08fdb2d
pass glmnets max_iter also in cases it doesnt converge,
DavidEiglspergerQC Feb 11, 2026
6b1780d
distribution adjustments
DavidEiglspergerQC Feb 11, 2026
2b403f3
Change goldenmaster generation to only include housing/insurance data…
DavidEiglspergerQC Feb 13, 2026
d1f3714
show maximum bar length for not converged cases
DavidEiglspergerQC Feb 13, 2026
4827a0e
Move num_rows into param_grid to allow for flexible number of rows ac…
DavidEiglspergerQC Feb 16, 2026
a262096
Final changes
DavidEiglspergerQC Feb 17, 2026
a191527
Smal fix to benchmarks.rst
DavidEiglspergerQC Feb 17, 2026
dba960e
CI fix
DavidEiglspergerQC Feb 17, 2026
15dcea8
small changes
DavidEiglspergerQC Feb 17, 2026
b5941d0
Improve wording
DavidEiglspergerQC Feb 20, 2026
616c925
Small fixes & updated figures
DavidEiglspergerQC Feb 20, 2026
8c617df
Incorporate feedback
DavidEiglspergerQC Feb 20, 2026
e365ed5
Don't follow pytest import
stanmart Feb 20, 2026
89f95ab
Tiny wordings
DavidEiglspergerQC Feb 22, 2026
4dd016a
Rerun benchmarks
DavidEiglspergerQC Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Project-specific
benchmark_output/
# Benchmark outputs, only track docs/results.csv and docs/config.yaml
glum_benchmarks/results/*
!glum_benchmarks/results/docs/
!glum_benchmarks/results/docs/**
glum_benchmarks/results/docs/*
!glum_benchmarks/results/docs/results.csv
!glum_benchmarks/results/docs/config.yaml
glum_benchmarks/.cache/

# Files created by templating
dense.cpp
Expand Down Expand Up @@ -137,9 +143,6 @@ mlruns
*.pdf
*.lprof

# GLM_BENCHMARKS_CACHE
cache

# pkgs
pkgs/*

Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ repos:
- id: mypy
name: mypy
entry: pixi run -e default mypy
exclude: (^tests/|^src/glum_benchmarks/orig_sklearn_fork/)
exclude: ^tests/
language: system
types: [python]
require_serial: true
Expand Down
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,23 @@

Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!

The goal of `glum` is to be at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports
We believe that for GLM development, broad support for distributions, regularization, and statistical inference, along with fast formula-based specification, is key. `glum` supports

* Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
* L1 regularization, which produces sparse and easily interpretable solutions
* L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
* Elastic net regularization
* Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
* Built-in formula-based model specification using `formulaic`
* Classical statistical inference for unregularized models
* Box constraints, linear inequality constraints, sample weights, offsets
Comment thread
DavidEiglspergerQC marked this conversation as resolved.

This repo also includes tools for benchmarking GLM implementations in the `glum_benchmarks` module. For details on the benchmarking, [see here](src/glum_benchmarks/README.md). Although the performance of `glum` relative to `glmnet` and `h2o` depends on the specific problem, we find that when N >> K (there are more observations than predictors), it is consistently much faster for a wide range of problems.
Performance also matters, so we conducted extensive benchmarks against other modern libraries. Although performance depends on the specific problem, we find that when N >> K (there are more observations than predictors), `glum` is consistently much faster for a wide range of problems. This repo includes the benchmarking tools in the `glum_benchmarks` module. For details, [see here](glum_benchmarks/README.md).

![Performance benchmarks](docs/_static/headline_benchmark.png#gh-light-mode-only)
![Performance benchmarks](docs/_static/headline_benchmark_dark.png#gh-dark-mode-only)
<!-- BENCHMARK_FIGURES_START -->
<img src="docs/_static/wide-insurance-gamma-normalized.png#gh-light-mode-only" alt="Benchmark results" width="600">
<img src="docs/_static/wide-insurance-gamma-normalized_dark.png#gh-dark-mode-only" alt="Benchmark results" width="600">
<!-- BENCHMARK_FIGURES_END -->

For more information on `glum`, including tutorials and API reference, please see [the documentation](https://glum.readthedocs.io/en/latest/).

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Comment thread
DavidEiglspergerQC marked this conversation as resolved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
130 changes: 116 additions & 14 deletions docs/benchmarks.rst
Original file line number Diff line number Diff line change
@@ -1,30 +1,132 @@
Benchmarks against glmnet and H2O
Benchmarks
=================================

The following benchmarks were run on a MacBook Pro laptop with a quad-core Intel Core i5.
The following benchmarks were run on a MacBook Pro laptop with an Apple M4 Max chip.

The title of each plot refers to both which dataset the benchmark was run on and whether a L2 ridge regression penalty or an L1 lasso penalty was included. For example "Narrow-Insurance-Ridge" was run on the ``narrow-insurance`` dataset with a ridge regression penalty. Each dataset/penalty pair is tested on five distributions that cover most of the common GLM types. The outcome variable is modified appropriately so that the behavior is similar to that expected for the distribution. For example, for the Poisson regression, we predict the number of claims per person. And for the binomial regression, we predict whether any given individual has ever had a claim. For the ``housing`` dataset, we only test three distributions because it does not contain count data that can be used as an outcome.
Each plot title indicates the dataset and distribution used. For example, "Wide-Insurance-Gamma" refers to the ``wide-insurance`` dataset fit with a gamma distribution. Further information about the datasets can be found at the end of the document.

Note that glum was originally developed to solve problems where N >> K (number of observations is larger than the number of predictors), which is the case for the following benchmarks.
For each dataset/distribution pair, we benchmark three regularization types:

If a bar goes out of the range of the chart, the exact runtime is printed on the bar with an arrow indicating that the bar is truncated.
- Elastic net (``l1_ratio=0.5``): ``elastic-net``
- Ridge (``l1_ratio=0.0``): ``ridge``
- Lasso (``l1_ratio=1.0``): ``lasso``

.. image:: _static/narrow-insurance-l2.png
We extract target variables and benchmark them under typical distributions (for example, insurance claim counts using Poisson models).

Runtime plots are reported relative to ``glum``: for each benchmark case, ``glum``'s runtime is normalized to 1.0 and other libraries' runtimes are scaled accordingly. If a bar exceeds the plotting range, the exact runtime is printed on the bar and an arrow indicates truncation.

We compare ``glum`` against ``sklearn``, ``skglm``, ``glmnet``, ``h2o`` and ``celer``. As some libraries do not support all benchmark cases, these combinations are shown as ``N/A`` (not supported). If a library does not converge (either it reaches ``max_iter`` or exceeds the 100s timeout), it is shown as ``NC`` (not converged) at the maximum bar height.

glum was designed for settings with N >> K —that is, many more observations than predictors, apart from high-cardinality categorical features. This regime is well illustrated by the wide-insurance benchmark. For insurance data, we evaluate gamma, Poisson, and Tweedie distributions.

.. BENCHMARK_FIGURES_START

.. image:: _static/wide-insurance-poisson-normalized.png
:width: 700
.. image:: _static/narrow-insurance-lasso.png

.. image:: _static/wide-insurance-gamma-normalized.png
:width: 700
.. image:: _static/intermediate-insurance-l2.png

.. image:: _static/wide-insurance-tweedie-p=1.5-normalized.png
:width: 700
.. image:: _static/intermediate-insurance-lasso.png

.. BENCHMARK_FIGURES_END

To showcase ``glum’s`` performance on another dataset, we also report results for ``intermediate-housing``, which has N >> K and only numerical (no categorical) features. For this dataset, we benchmark gamma and Gaussian distributions.

.. BENCHMARK_FIGURES_START

.. image:: _static/intermediate-housing-gamma-normalized.png
:width: 700
.. image:: _static/wide-insurance-l2.png

.. image:: _static/intermediate-housing-gaussian-normalized.png
:width: 700

Note that the ``r-glmnet`` result for the ``wide-insurance-ridge`` Poisson benchmark is missing because ``glmnet`` did not converge after several hours of runtime.
.. BENCHMARK_FIGURES_END


``glum`` is primarily optimized for N >> K settings, and is not tuned for N ~ K or N < K. This is illustrated by the simulated benchmark with varying K/N ratios: ``glum`` performs best when N >> K, and relative performance decreases as K/N increases.

.. image:: _static/wide-insurance-lasso.png
For K/N = 2, we include an unnormalized runtime plot, because in the normalized version the ``glmnet`` bar becomes too small to read clearly.

.. BENCHMARK_FIGURES_START

.. image:: _static/simulated-glm-gaussian-k-over-n-0.01-normalized.png
:width: 700
.. image:: _static/intermediate-housing-l2.png

.. image:: _static/simulated-glm-gaussian-k-over-n-0.1-normalized.png
:width: 700
.. image:: _static/intermediate-housing-lasso.png

.. image:: _static/simulated-glm-gaussian-k-over-n-0.5-normalized.png
:width: 700

.. image:: _static/simulated-glm-gaussian-k-over-n-1-normalized.png
:width: 700

.. image:: _static/simulated-glm-gaussian-k-over-n-2.png
:width: 700

.. BENCHMARK_FIGURES_END

In the following table more information about the used datasets can be found. After filtering for ``ClaimAmountCut > 0`` in the "Wide-Insurance-Gamma" dataset, only about 25,000 rows are left. We, therefore, artificially increase the dataset by sampling with replacement and adding noise. The filter is also why the number of columns after one-hot-encoding is smaller compared to the other distributions on this dataset because some category levels only exist in the dropped rows.

For ``simulated-glm`` we reduce N from 10 000 to 1 000 for K/N = 1 and K/N = 2 in order to speed things up (with N = 10 000 nearly no library converges within the 100s limit).

.. list-table:: Dataset Overview
:header-rows: 1
:widths: 30 10 5 5 10 40

* - (Dataset, Distribution)
- (N, K)
- Cat. Columns
- Num. Columns
- Columns (OHE)
- Source
* - (wide-insurance, poisson), (wide-insurance, tweedie)
- (600 000, 9)
- 8
- 1
- 322
- `freMTPL2 <https://www.openml.org/search?type=data&id=41214>`_ + feature engineering/preprocessing
* - (wide-insurance, gamma)
- (600 000, 9)
- 8
- 1
- 256
- `freMTPL2 <https://www.openml.org/search?type=data&id=41214>`_ + feature engineering/preprocessing
* - (intermediate-housing, poisson), (intermediate-housing, gamma)
- (21 613, 10)
- 0
- 10
- 10
- `house_sales <https://www.openml.org/search?type=data&id=42092>`_ + feature engineering/preprocessing
* - (simulated-glm, gaussian) with K/N = 0.01
- (10 000, 100)
- 0
- 100
- 100
- simulated
* - (simulated-glm, gaussian) with K/N = 0.1
- (10 000, 1 000)
- 0
- 1 000
- 1 000
- simulated
* - (simulated-glm, gaussian) with K/N = 0.5
- (10 000, 5 000)
- 0
- 5 000
- 5 000
- simulated
* - (simulated-glm, gaussian) with K/N = 1
- (1 000, 1 000)
- 0
- 1 000
- 1 000
- simulated
* - (simulated-glm, gaussian) with K/N = 2
- (1 000, 2 000)
- 0
- 2 000
- 2 000
- simulated
Loading
Loading