-
Notifications
You must be signed in to change notification settings - Fork 34
Fix/update benchmarks #968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
369d0fd
Fix data leakage in housing data generation
DavidEiglspergerQC caa00a0
First implementation of additional benchmarks
DavidEiglspergerQC 351e061
new benchmark adaption
DavidEiglspergerQC 171e695
exclude statsmodels
DavidEiglspergerQC 6d49048
adjust benchmarks
DavidEiglspergerQC 71edc78
adapt parameters
DavidEiglspergerQC 4cae856
Delete cv and adjust plotting
DavidEiglspergerQC 42411b7
Resolve conflicts
DavidEiglspergerQC ac74924
Add scaling
DavidEiglspergerQC e31b96e
Scaling only for benchmarks, not for golden master tests
DavidEiglspergerQC bd09f9d
Closed form solution for l2-gaussian
DavidEiglspergerQC 4abe075
Benchmark CLI replacement and separation of benchmarking
DavidEiglspergerQC c35bb1b
fix CI
DavidEiglspergerQC eee3877
Improved plotting and some small adaptions/fixes
DavidEiglspergerQC 67bbb17
update comments and fix CI
DavidEiglspergerQC e11a106
Config in yaml, advanced plotting & storage=auto for glum
DavidEiglspergerQC 3e2d7ac
Incorporate feedback and add some further functionalities
DavidEiglspergerQC de723ad
chore: retrigger CI
DavidEiglspergerQC b2119d3
change timeout logic
DavidEiglspergerQC 7f169b3
Refinements & new features
DavidEiglspergerQC 4fae115
Bug fixes, parameter tuning and scaling refinement
DavidEiglspergerQC 0db1cb2
CI fix
DavidEiglspergerQC 80d9ed2
Minor cleaning
DavidEiglspergerQC 863e487
Allow flexible K/N ratio for the simulated problems
DavidEiglspergerQC 423fb4f
adjust defaults and available distributions
DavidEiglspergerQC 68212f7
k_over_n as entry in param grid and rich for tables
DavidEiglspergerQC 5a74424
small adjustments
DavidEiglspergerQC 08fdb2d
pass glmnets max_iter also in cases it doesnt converge,
DavidEiglspergerQC 6b1780d
distribution adjustments
DavidEiglspergerQC 2b403f3
Change goldenmaster generation to only include housing/insurance data…
DavidEiglspergerQC d1f3714
show maximum bar length for not converged cases
DavidEiglspergerQC 4827a0e
Move num_rows into param_grid to allow for flexible number of rows ac…
DavidEiglspergerQC a262096
Final changes
DavidEiglspergerQC a191527
Smal fix to benchmarks.rst
DavidEiglspergerQC dba960e
CI fix
DavidEiglspergerQC 15dcea8
small changes
DavidEiglspergerQC b5941d0
Improve wording
DavidEiglspergerQC 616c925
Small fixes & updated figures
DavidEiglspergerQC 8c617df
Incorporate feedback
DavidEiglspergerQC e365ed5
Don't follow pytest import
stanmart 89f95ab
Tiny wordings
DavidEiglspergerQC 4dd016a
Rerun benchmarks
DavidEiglspergerQC File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
|
DavidEiglspergerQC marked this conversation as resolved.
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,30 +1,132 @@ | ||
| Benchmarks against glmnet and H2O | ||
| Benchmarks | ||
| ================================= | ||
|
|
||
| The following benchmarks were run on a MacBook Pro laptop with a quad-core Intel Core i5. | ||
| The following benchmarks were run on a MacBook Pro laptop with an Apple M4 Max chip. | ||
|
|
||
| The title of each plot refers to both which dataset the benchmark was run on and whether a L2 ridge regression penalty or an L1 lasso penalty was included. For example "Narrow-Insurance-Ridge" was run on the ``narrow-insurance`` dataset with a ridge regression penalty. Each dataset/penalty pair is tested on five distributions that cover most of the common GLM types. The outcome variable is modified appropriately so that the behavior is similar to that expected for the distribution. For example, for the Poisson regression, we predict the number of claims per person. And for the binomial regression, we predict whether any given individual has ever had a claim. For the ``housing`` dataset, we only test three distributions because it does not contain count data that can be used as an outcome. | ||
| Each plot title indicates the dataset and distribution used. For example, "Wide-Insurance-Gamma" refers to the ``wide-insurance`` dataset fit with a gamma distribution. Further information about the datasets can be found at the end of the document. | ||
|
|
||
| Note that glum was originally developed to solve problems where N >> K (number of observations is larger than the number of predictors), which is the case for the following benchmarks. | ||
| For each dataset/distribution pair, we benchmark three regularization types: | ||
|
|
||
| If a bar goes out of the range of the chart, the exact runtime is printed on the bar with an arrow indicating that the bar is truncated. | ||
| - Elastic net (``l1_ratio=0.5``): ``elastic-net`` | ||
| - Ridge (``l1_ratio=0.0``): ``ridge`` | ||
| - Lasso (``l1_ratio=1.0``): ``lasso`` | ||
|
|
||
| .. image:: _static/narrow-insurance-l2.png | ||
| We extract target variables and benchmark them under typical distributions (for example, insurance claim counts using Poisson models). | ||
|
|
||
| Runtime plots are reported relative to ``glum``: for each benchmark case, ``glum``'s runtime is normalized to 1.0 and other libraries' runtimes are scaled accordingly. If a bar exceeds the plotting range, the exact runtime is printed on the bar and an arrow indicates truncation. | ||
|
|
||
| We compare ``glum`` against ``sklearn``, ``skglm``, ``glmnet``, ``h2o`` and ``celer``. As some libraries do not support all benchmark cases, these combinations are shown as ``N/A`` (not supported). If a library does not converge (either it reaches ``max_iter`` or exceeds the 100s timeout), it is shown as ``NC`` (not converged) at the maximum bar height. | ||
|
|
||
| glum was designed for settings with N >> K —that is, many more observations than predictors, apart from high-cardinality categorical features. This regime is well illustrated by the wide-insurance benchmark. For insurance data, we evaluate gamma, Poisson, and Tweedie distributions. | ||
|
|
||
| .. BENCHMARK_FIGURES_START | ||
|
|
||
| .. image:: _static/wide-insurance-poisson-normalized.png | ||
| :width: 700 | ||
| .. image:: _static/narrow-insurance-lasso.png | ||
|
|
||
| .. image:: _static/wide-insurance-gamma-normalized.png | ||
| :width: 700 | ||
| .. image:: _static/intermediate-insurance-l2.png | ||
|
|
||
| .. image:: _static/wide-insurance-tweedie-p=1.5-normalized.png | ||
| :width: 700 | ||
| .. image:: _static/intermediate-insurance-lasso.png | ||
|
|
||
| .. BENCHMARK_FIGURES_END | ||
|
|
||
| To showcase ``glum’s`` performance on another dataset, we also report results for ``intermediate-housing``, which has N >> K and only numerical (no categorical) features. For this dataset, we benchmark gamma and Gaussian distributions. | ||
|
|
||
| .. BENCHMARK_FIGURES_START | ||
|
|
||
| .. image:: _static/intermediate-housing-gamma-normalized.png | ||
| :width: 700 | ||
| .. image:: _static/wide-insurance-l2.png | ||
|
|
||
| .. image:: _static/intermediate-housing-gaussian-normalized.png | ||
| :width: 700 | ||
|
|
||
| Note that the ``r-glmnet`` result for the ``wide-insurance-ridge`` Poisson benchmark is missing because ``glmnet`` did not converge after several hours of runtime. | ||
| .. BENCHMARK_FIGURES_END | ||
|
|
||
|
|
||
| ``glum`` is primarily optimized for N >> K settings, and is not tuned for N ~ K or N < K. This is illustrated by the simulated benchmark with varying K/N ratios: ``glum`` performs best when N >> K, and relative performance decreases as K/N increases. | ||
|
|
||
| .. image:: _static/wide-insurance-lasso.png | ||
| For K/N = 2, we include an unnormalized runtime plot, because in the normalized version the ``glmnet`` bar becomes too small to read clearly. | ||
|
|
||
| .. BENCHMARK_FIGURES_START | ||
|
|
||
| .. image:: _static/simulated-glm-gaussian-k-over-n-0.01-normalized.png | ||
| :width: 700 | ||
| .. image:: _static/intermediate-housing-l2.png | ||
|
|
||
| .. image:: _static/simulated-glm-gaussian-k-over-n-0.1-normalized.png | ||
| :width: 700 | ||
| .. image:: _static/intermediate-housing-lasso.png | ||
|
|
||
| .. image:: _static/simulated-glm-gaussian-k-over-n-0.5-normalized.png | ||
| :width: 700 | ||
|
|
||
| .. image:: _static/simulated-glm-gaussian-k-over-n-1-normalized.png | ||
| :width: 700 | ||
|
|
||
| .. image:: _static/simulated-glm-gaussian-k-over-n-2.png | ||
| :width: 700 | ||
|
|
||
| .. BENCHMARK_FIGURES_END | ||
|
|
||
| In the following table more information about the used datasets can be found. After filtering for ``ClaimAmountCut > 0`` in the "Wide-Insurance-Gamma" dataset, only about 25,000 rows are left. We, therefore, artificially increase the dataset by sampling with replacement and adding noise. The filter is also why the number of columns after one-hot-encoding is smaller compared to the other distributions on this dataset because some category levels only exist in the dropped rows. | ||
|
|
||
| For ``simulated-glm`` we reduce N from 10 000 to 1 000 for K/N = 1 and K/N = 2 in order to speed things up (with N = 10 000 nearly no library converges within the 100s limit). | ||
|
|
||
| .. list-table:: Dataset Overview | ||
| :header-rows: 1 | ||
| :widths: 30 10 5 5 10 40 | ||
|
|
||
| * - (Dataset, Distribution) | ||
| - (N, K) | ||
| - Cat. Columns | ||
| - Num. Columns | ||
| - Columns (OHE) | ||
| - Source | ||
| * - (wide-insurance, poisson), (wide-insurance, tweedie) | ||
| - (600 000, 9) | ||
| - 8 | ||
| - 1 | ||
| - 322 | ||
| - `freMTPL2 <https://www.openml.org/search?type=data&id=41214>`_ + feature engineering/preprocessing | ||
| * - (wide-insurance, gamma) | ||
| - (600 000, 9) | ||
| - 8 | ||
| - 1 | ||
| - 256 | ||
| - `freMTPL2 <https://www.openml.org/search?type=data&id=41214>`_ + feature engineering/preprocessing | ||
| * - (intermediate-housing, poisson), (intermediate-housing, gamma) | ||
| - (21 613, 10) | ||
| - 0 | ||
| - 10 | ||
| - 10 | ||
| - `house_sales <https://www.openml.org/search?type=data&id=42092>`_ + feature engineering/preprocessing | ||
| * - (simulated-glm, gaussian) with K/N = 0.01 | ||
| - (10 000, 100) | ||
| - 0 | ||
| - 100 | ||
| - 100 | ||
| - simulated | ||
| * - (simulated-glm, gaussian) with K/N = 0.1 | ||
| - (10 000, 1 000) | ||
| - 0 | ||
| - 1 000 | ||
| - 1 000 | ||
| - simulated | ||
| * - (simulated-glm, gaussian) with K/N = 0.5 | ||
| - (10 000, 5 000) | ||
| - 0 | ||
| - 5 000 | ||
| - 5 000 | ||
| - simulated | ||
| * - (simulated-glm, gaussian) with K/N = 1 | ||
| - (1 000, 1 000) | ||
| - 0 | ||
| - 1 000 | ||
| - 1 000 | ||
| - simulated | ||
| * - (simulated-glm, gaussian) with K/N = 2 | ||
| - (1 000, 2 000) | ||
| - 0 | ||
| - 2 000 | ||
| - 2 000 | ||
| - simulated |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.