feat(stats_tests): implement KS test by mdahlin · Pull Request #329 · statrs-dev/statrs

mdahlin · 2025-02-16T04:21:04Z

Implements versions of the one-sample and two-sample KS test

codecov · 2025-02-16T04:22:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.72%. Comparing base (f4136d5) to head (fa6c3f5).
Report is 13 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #329      +/-   ##
==========================================
+ Coverage   94.26%   94.72%   +0.45%     
==========================================
  Files          58       60       +2     
  Lines       12943    14238    +1295     
==========================================
+ Hits        12201    13487    +1286     
- Misses        742      751       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mdahlin · 2025-02-16T15:45:10Z

statrs/src/stats_tests/ks_test.rs

Line 246 in 5fad268

KSOneSampleAlternativeMethod::TwoSidedExact => {

My implementation to calculate the exact p-value for a one sample KS test requires the use of nalgebra to do some matrix multiplication. I was wondering if there were suggestions for how to add the feature gate. I'm not too familiar with feature gating, but the two ideas that came to my mind were to

separate ks_onesample and ks_twosample and feature gate all of ks_onesample
feature gate specifically the enum variant (though this seems like it could cause some issues)

YeungOnion · 2025-02-20T17:29:26Z

Thanks for this!

Since feature gating is for conditional compilation, we won't be able to just bar the enum, since that's from a specific code path, but if you walked down that code path and feature gated the whole path that would compile with the feature flag disabled.

I'll use the review tools to make the suggestions so they're near to the code.

YeungOnion · 2025-02-20T17:30:15Z

src/stats_tests/ks_test.rs

+use core::f64;
+use std::iter::zip;
+
+use nalgebra::DMatrix;


either gate this or write the use statement near to usage

YeungOnion · 2025-02-20T17:31:54Z

src/stats_tests/ks_test.rs

+    /// `n`s and will error with if there are ties in the input data or the input data is too
+    /// large. The threshold for too large is data with length 170 lining up with the
+    /// implementation of [`factorial`] being used.
+    TwoSidedExact,


gate this like you mentioned
I believe it can go before or after the docstring.

YeungOnion · 2025-02-20T17:32:21Z

src/stats_tests/ks_test.rs

+    2.0 * sum
+}
+
+fn onesample_marsaglia_et_al_twosided_pvalue(d: f64, n: f64) -> Result<f64, KSTestError> {


this one would be gated

YeungOnion · 2025-02-20T18:27:41Z

src/stats_tests/ks_test.rs

+
+    let mut mm = DMatrix::<f64>::zeros(m, m);
+
+    // PERF: definitely a better way to fill the matrix. Also could cache the


There's certainly another way to fill this, but I think the readability would improve significantly, but I don't think the performance would significantly. I don't see any overwrites in my review. But some of the evaluations could be SIMD as you do have a few instances of, "set the first column by this expressions" and "set these antidiagonals by this expression" so it would be interesting to see how much the gains could be.

Some bigger gains could be improving calculating a matrix element of M^n, where M=mm especially where the number of bins is large (matmul is O(N^3)) as nalgebra is primarily targeted for graphics. faer may be better choice for this or using nalgebra-lapack to connect to battle-tested fortran. But let's take that decision out of this scope. For computing a matrix element,

If we have the unit vector along dimension k to be u, then we can compute matrix element k,k as u^T M u, which could be expressed as below, where n/2 is floor division.

$$ u^T M^n u = \left(u^T M^{n/2}\right) \left(M^{n/2}u\right) = v^T v, \quad n \textrm{ mod } 2 == 0 $$

and for odd n, we could raise M to the floor of half the power and add the explicit multiply of M, again with n / 2 being floor divide,

$$ v^T \left(M^{n/2 + 1}u\right) $$

diagonalization would be useful here as well, unsure at what size the tradeoff will benefit to do n/2 matmul vs diagonalize since both are O(N^3), but matmul seems like less bookkeeping to parallelize.

And the factorial function is cached prior, so you're good there for less than 170!

Implemented. Thanks for the suggestion. Some quick benchmarking below

Original Updated

d=0.15; n=8.0 638ns 521ns

d=0.15; n=135.0 928μs 60μs

d=0.62; n=8.0 4.4μs 3.0μs

d=0.62; n=135.0 44.3ms 2.7ms

KS test one-sample method requires the nalgebra feature to do matrix multiplication

improved process to find the k, k matrix element after raising the matrix to a power.

YeungOnion · 2025-02-24T14:56:20Z

The test failure is related to something we're fixing in #325, so this is good to go. Thanks for the quick bench. If you've got that runner available, would you be able to share it? I'd use it to test if another optimization is significantly worthwhile.

mdahlin · 2025-03-01T21:55:21Z

If you've got that runner available, would you be able to share it?

It was just some basic code I pulled together with criterion, so not sure how helpful it is, but see below

fn bench_exacts(c: &mut Criterion) {
    let mut group = c.benchmark_group("Exact");
    for i in [(0.15, 8.0), (0.15, 135.0), (0.62, 8.0), (0.62, 135.0)].iter() {
        group.bench_with_input(
            BenchmarkId::new("Original", format!("{:?}", i)),
            i,
            |b, (d, n)| b.iter(|| original(*d, *n)),
        );
        group.bench_with_input(
            BenchmarkId::new("Alternative", format!("{:?}", i)),
            i,
            |b, (d, n)| b.iter(|| better_matmul(*d, *n)),
        );
    }
    group.finish();
}
criterion_group!(benches, bench_exacts);
criterion_main!(benches);

feat(stats_tests): implement KS test

015691d

Implements versions of the one-sample and two-sample KS test

refeactor: remove ks_twosample usage of nalgebra

5fad268

YeungOnion reviewed Feb 20, 2025

View reviewed changes

mdahlin added 2 commits February 22, 2025 13:55

fix: feature gate exact KS test one-sample method

151ce7b

KS test one-sample method requires the nalgebra feature to do matrix multiplication

perf: more efficient calculation for KS one-sample exact

fa6c3f5

improved process to find the k, k matrix element after raising the matrix to a power.

YeungOnion merged commit 7ef8595 into statrs-dev:master Feb 24, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stats_tests): implement KS test#329

feat(stats_tests): implement KS test#329
YeungOnion merged 4 commits intostatrs-dev:masterfrom
mdahlin:ks_test

mdahlin commented Feb 16, 2025

Uh oh!

codecov bot commented Feb 16, 2025 •

edited

Loading

Uh oh!

mdahlin commented Feb 16, 2025

Uh oh!

YeungOnion commented Feb 20, 2025

Uh oh!

YeungOnion Feb 20, 2025

Uh oh!

YeungOnion Feb 20, 2025

Uh oh!

YeungOnion Feb 20, 2025

Uh oh!

YeungOnion Feb 20, 2025

Uh oh!

mdahlin Feb 22, 2025 •

edited

Loading

Uh oh!

YeungOnion commented Feb 24, 2025

Uh oh!

Uh oh!

mdahlin commented Mar 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		let mut mm = DMatrix::<f64>::zeros(m, m);

		// PERF: definitely a better way to fill the matrix. Also could cache the

	Original	Updated
d=0.15; n=8.0	638ns	521ns
d=0.15; n=135.0	928μs	60μs
d=0.62; n=8.0	4.4μs	3.0μs
d=0.62; n=135.0	44.3ms	2.7ms

Conversation

mdahlin commented Feb 16, 2025

Uh oh!

codecov bot commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mdahlin commented Feb 16, 2025

Uh oh!

YeungOnion commented Feb 20, 2025

Uh oh!

YeungOnion Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

YeungOnion Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

YeungOnion Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

YeungOnion Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

mdahlin Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YeungOnion commented Feb 24, 2025

Uh oh!

Uh oh!

mdahlin commented Mar 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 16, 2025 •

edited

Loading

mdahlin Feb 22, 2025 •

edited

Loading