Quantco · DavidEiglspergerQC · Mar 5, 2026 · Jan 22, 2026 · Jan 21, 2026 · Jan 22, 2026
@@ -1,5 +1,11 @@
-# Project-specific
-benchmark_output/
+# Benchmark outputs, only track docs/results.csv and docs/config.yaml
+glum_benchmarks/results/*
+!glum_benchmarks/results/docs/
+!glum_benchmarks/results/docs/**
+glum_benchmarks/results/docs/*
+!glum_benchmarks/results/docs/results.csv
+!glum_benchmarks/results/docs/config.yaml
+glum_benchmarks/.cache/
 
 # Files created by templating
 dense.cpp
@@ -137,9 +143,6 @@ mlruns
 *.pdf
 *.lprof
 
-# GLM_BENCHMARKS_CACHE
-cache
-
 # pkgs
 pkgs/*
 

@@ -41,7 +41,7 @@ repos:
       - id: mypy
         name: mypy
         entry: pixi run -e default mypy
-        exclude: (^tests/|^src/glum_benchmarks/orig_sklearn_fork/)
+        exclude: ^tests/
         language: system
         types: [python]
         require_serial: true

@@ -13,19 +13,23 @@
 
 Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
 
-The goal of `glum` is to be at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports
+We believe that for GLM development, broad support for distributions, regularization, and statistical inference, along with fast formula-based specification, is key. `glum` supports
 
 * Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
 * L1 regularization, which produces sparse and easily interpretable solutions
 * L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
 * Elastic net regularization
 * Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
+* Built-in formula-based model specification using `formulaic`
+* Classical statistical inference for unregularized models
 * Box constraints, linear inequality constraints, sample weights, offsets
 
-This repo also includes tools for benchmarking GLM implementations in the `glum_benchmarks` module. For details on the benchmarking, [see here](src/glum_benchmarks/README.md). Although the performance of `glum` relative to `glmnet` and `h2o` depends on the specific problem, we find that when N >> K (there are more observations than predictors), it is consistently much faster for a wide range of problems.
+Performance also matters, so we conducted extensive benchmarks against other modern libraries. Although performance depends on the specific problem, we find that when N >> K (there are more observations than predictors), `glum` is consistently much faster for a wide range of problems. This repo includes the benchmarking tools in the `glum_benchmarks` module. For details, [see here](glum_benchmarks/README.md).
 
-![Performance benchmarks](docs/_static/headline_benchmark.png#gh-light-mode-only)
-![Performance benchmarks](docs/_static/headline_benchmark_dark.png#gh-dark-mode-only)
+<!-- BENCHMARK_FIGURES_START -->
+<img src="docs/_static/wide-insurance-gamma-normalized.png#gh-light-mode-only" alt="Benchmark results" width="600">
+<img src="docs/_static/wide-insurance-gamma-normalized_dark.png#gh-dark-mode-only" alt="Benchmark results" width="600">
+<!-- BENCHMARK_FIGURES_END -->
 
 For more information on `glum`, including tutorials and API reference, please see [the documentation](https://glum.readthedocs.io/en/latest/).
 

@@ -1,30 +1,132 @@
-Benchmarks against glmnet and H2O
+Benchmarks
 =================================
 
-The following benchmarks were run on a MacBook Pro laptop with a quad-core Intel Core i5.
+The following benchmarks were run on a MacBook Pro laptop with an Apple M4 Max chip.
 
-The title of each plot refers to both which dataset the benchmark was run on and whether a L2 ridge regression penalty or an L1 lasso penalty was included. For example "Narrow-Insurance-Ridge" was run on the ``narrow-insurance`` dataset with a ridge regression penalty. Each dataset/penalty pair is tested on five distributions that cover most of the common GLM types. The outcome variable is modified appropriately so that the behavior is similar to that expected for the distribution. For example, for the Poisson regression, we predict the number of claims per person. And for the binomial regression, we predict whether any given individual has ever had a claim. For the ``housing`` dataset, we only test three distributions because it does not contain count data that can be used as an outcome.
+Each plot title indicates the dataset and distribution used. For example, "Wide-Insurance-Gamma" refers to the ``wide-insurance`` dataset fit with a gamma distribution. Further information about the datasets can be found at the end of the document.
 
-Note that glum was originally developed to solve problems where N >> K (number of observations is larger than the number of predictors), which is the case for the following benchmarks.
+For each dataset/distribution pair, we benchmark three regularization types:
 
-If a bar goes out of the range of the chart, the exact runtime is printed on the bar with an arrow indicating that the bar is truncated.
+- Elastic net (``l1_ratio=0.5``): ``elastic-net``
+- Ridge (``l1_ratio=0.0``): ``ridge``
+- Lasso (``l1_ratio=1.0``): ``lasso``
 
-.. image:: _static/narrow-insurance-l2.png
+We extract target variables and benchmark them under typical distributions (for example, insurance claim counts using Poisson models).
+
+Runtime plots are reported relative to ``glum``: for each benchmark case, ``glum``'s runtime is normalized to 1.0 and other libraries' runtimes are scaled accordingly. If a bar exceeds the plotting range, the exact runtime is printed on the bar and an arrow indicates truncation.
+
+We compare ``glum`` against ``sklearn``, ``skglm``, ``glmnet``, ``h2o`` and ``celer``. As some libraries do not support all benchmark cases, these combinations are shown as ``N/A`` (not supported). If a library does not converge (either it reaches ``max_iter`` or exceeds the 100s timeout), it is shown as ``NC`` (not converged) at the maximum bar height.
+
+glum was designed for settings with N >> K —that is, many more observations than predictors, apart from high-cardinality categorical features. This regime is well illustrated by the wide-insurance benchmark. For insurance data, we evaluate gamma, Poisson, and Tweedie distributions.
+
+.. BENCHMARK_FIGURES_START
+
+.. image:: _static/wide-insurance-poisson-normalized.png
    :width: 700
-.. image:: _static/narrow-insurance-lasso.png
+
+.. image:: _static/wide-insurance-gamma-normalized.png
    :width: 700
-.. image:: _static/intermediate-insurance-l2.png
+
+.. image:: _static/wide-insurance-tweedie-p=1.5-normalized.png
    :width: 700
-.. image:: _static/intermediate-insurance-lasso.png
+
+.. BENCHMARK_FIGURES_END
+
+To showcase ``glum’s`` performance on another dataset, we also report results for ``intermediate-housing``, which has N >> K and only numerical (no categorical) features. For this dataset, we benchmark gamma and Gaussian distributions.
+
+.. BENCHMARK_FIGURES_START
+
+.. image:: _static/intermediate-housing-gamma-normalized.png
    :width: 700
-.. image:: _static/wide-insurance-l2.png
+
+.. image:: _static/intermediate-housing-gaussian-normalized.png
    :width: 700
 
-Note that the ``r-glmnet`` result for the ``wide-insurance-ridge`` Poisson benchmark is missing because ``glmnet`` did not converge after several hours of runtime.
+.. BENCHMARK_FIGURES_END
+
+
+``glum`` is primarily optimized for N >> K settings, and is not tuned for N ~ K or N < K. This is illustrated by the simulated benchmark with varying K/N ratios: ``glum`` performs best when N >> K, and relative performance decreases as K/N increases.
 
-.. image:: _static/wide-insurance-lasso.png
+For K/N = 2, we include an unnormalized runtime plot, because in the normalized version the ``glmnet`` bar becomes too small to read clearly.
+
+.. BENCHMARK_FIGURES_START
+
+.. image:: _static/simulated-glm-gaussian-k-over-n-0.01-normalized.png
    :width: 700
-.. image:: _static/intermediate-housing-l2.png
+
+.. image:: _static/simulated-glm-gaussian-k-over-n-0.1-normalized.png
    :width: 700
-.. image:: _static/intermediate-housing-lasso.png
+
+.. image:: _static/simulated-glm-gaussian-k-over-n-0.5-normalized.png
    :width: 700
+
+.. image:: _static/simulated-glm-gaussian-k-over-n-1-normalized.png
+   :width: 700
+
+.. image:: _static/simulated-glm-gaussian-k-over-n-2.png
+   :width: 700
+
+.. BENCHMARK_FIGURES_END
+
+In the following table more information about the used datasets can be found. After filtering for ``ClaimAmountCut > 0`` in the "Wide-Insurance-Gamma" dataset, only about 25,000 rows are left. We, therefore, artificially increase the dataset by sampling with replacement and adding noise. The filter is also why the number of columns after one-hot-encoding is smaller compared to the other distributions on this dataset because some category levels only exist in the dropped rows.
+
+For ``simulated-glm`` we reduce N from 10 000 to 1 000 for K/N = 1 and K/N = 2 in order to speed things up (with N = 10 000 nearly no library converges within the 100s limit).
+
+.. list-table:: Dataset Overview
+   :header-rows: 1
+   :widths: 30 10 5 5 10 40
+
+   * - (Dataset, Distribution)
+     - (N, K)
+     - Cat. Columns
+     - Num. Columns
+     - Columns (OHE)
+     - Source
+   * - (wide-insurance, poisson), (wide-insurance, tweedie)
+     - (600 000, 9)
+     - 8
+     - 1
+     - 322
+     - `freMTPL2 <https://www.openml.org/search?type=data&id=41214>`_ + feature engineering/preprocessing
+   * - (wide-insurance, gamma)
+     - (600 000, 9)
+     - 8
+     - 1
+     - 256
+     - `freMTPL2 <https://www.openml.org/search?type=data&id=41214>`_ + feature engineering/preprocessing
+   * - (intermediate-housing, poisson), (intermediate-housing, gamma)
+     - (21 613, 10)
+     - 0
+     - 10
+     - 10
+     - `house_sales <https://www.openml.org/search?type=data&id=42092>`_ + feature engineering/preprocessing
+   * - (simulated-glm, gaussian) with K/N = 0.01
+     - (10 000, 100)
+     - 0
+     - 100
+     - 100
+     - simulated
+   * - (simulated-glm, gaussian) with K/N = 0.1
+     - (10 000, 1 000)
+     - 0
+     - 1 000
+     - 1 000
+     - simulated
+   * - (simulated-glm, gaussian) with K/N = 0.5
+     - (10 000, 5 000)
+     - 0
+     - 5 000
+     - 5 000
+     - simulated
+   * - (simulated-glm, gaussian) with K/N = 1
+     - (1 000, 1 000)
+     - 0
+     - 1 000
+     - 1 000
+     - simulated
+   * - (simulated-glm, gaussian) with K/N = 2
+     - (1 000, 2 000)
+     - 0
+     - 2 000
+     - 2 000
+     - simulated