feat: add `min` and `max` by quentinblampey · Pull Request #116 · scverse/fast-array-utils

quentinblampey · 2025-08-27T07:30:45Z

As mentioned in #112, I added support for min/max, but I'm unsure about my implementation, notably regarding code quality:

The code is highly similar to the sum function, but it has some minor differences due to the fact that the min/max operations don't use the dtype argument. I think I could potentially make a function that is generic enough to handle all these "simple" operations at once, but I'm worried it would become too abstract and complex to read/maintain. What do you think?
Two functions (normalize_axis and get_shape) are shared for sum and min/max, so I wanted to check with you where we should move them. E.g., I can create stats/_utils.py?
The docs and tests are still missing, but I want to first check with you the above point and then I'll add these

Closes #112

codecov · 2025-08-27T07:31:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.13%. Comparing base (040a3cd) to head (9a30eb1).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #116      +/-   ##
==========================================
+ Coverage   98.33%   99.13%   +0.80%     
==========================================
  Files          17       19       +2     
  Lines         421      464      +43     
==========================================
+ Hits          414      460      +46     
+ Misses          7        4       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2025-08-27T07:39:06Z

CodSpeed Performance Report

Merging #116 will not alter performance

_{Comparing quentinblampey:minmax (9a30eb1) with main (040a3cd)}

Summary

✅ 160 untouched
🆕 72 new

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
🆕	`test_stats_benchmark[numpy.ndarray-1d-all-float32-max]`	N/A	2.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-1d-all-float32-min]`	N/A	2.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-1d-all-float64-max]`	N/A	4.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-1d-all-float64-min]`	N/A	4.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-1d-all-int32-max]`	N/A	1.6 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-1d-all-int32-min]`	N/A	1.6 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-all-float32-max]`	N/A	2.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-all-float32-min]`	N/A	2.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-all-float64-max]`	N/A	4.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-all-float64-min]`	N/A	4.1 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-all-int32-max]`	N/A	1.6 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-all-int32-min]`	N/A	1.6 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax0-float32-max]`	N/A	2.3 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax0-float32-min]`	N/A	2.3 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax0-float64-max]`	N/A	4.5 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax0-float64-min]`	N/A	4.5 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax0-int32-max]`	N/A	2.2 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax0-int32-min]`	N/A	2.2 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax1-float32-max]`	N/A	2.2 ms	N/A
🆕	`test_stats_benchmark[numpy.ndarray-2d-ax1-float32-min]`	N/A	2.2 ms	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

flying-sheep · 2025-08-28T15:33:14Z

Hi! Thanks for the PR!

I did some of the deduplication, but all function bodies are identical except for the existence/non-existence of dtype and the actual numpy function that’s being called. I’d therefore implement similar to here:

https://github.com/quentinblampey/fast-array-utils/blob/15347f7c1d741de4e867790c430b8c7058f049d9/src/fast_array_utils/stats/__init__.py#L273-L291

but I can do that later.

quentinblampey · 2025-10-06T10:00:02Z

Hi @flying-sheep,
Sorry for the delay, I finished the deduplication to handle the dtype correctly

A few comments:

I saw that for the sum on sparse arrays, you were using x.data.sum instead of x.sum. Is there a reason for this? I'm assuming scipy already use x.data internally for the sum, but I wanted to check with you because the new implementation uses x directly (else we would have issues, e.g. for the min when x.data only has positive values wouldn't return 0).
How to document these new functions now that we have made a generic builder? Directly updating .__doc__?
For the sum with dtype=int, since it applies it on chunks, the results were a little bit unexpected for me at first. I understand what it does and why we have this result, but it's a little bit unintuitive, because it's not consistent with direct dask usage and not consistent with rounding post float-sum either.

>>> import dask.array as da
>>> from fast_array_utils import stats
>>> x = da.random.random((1000, 1000), chunks=(100, 100))
>>> stats.sum(x).compute(), stats.sum(x, dtype=int).compute()

(np.float64(500227.801967892), np.int64(500173)) # large difference

>>> x.sum(dtype=int).compute()
0 # convert to int before summing

flying-sheep · 2025-10-13T08:49:34Z

Hi! I’m also responding late, since I was on a hackathon last week!

I saw that for the sum on sparse arrays, you were using x.data.sum instead of x.sum. Is there a reason for this?

Probably not, I assume that x.sum will do the right thing.

How to document these new functions now that we have made a generic builder?

maybe no better way than to go back to manual @overloads (maybe in a if TYPE_CHECKING block)

For the sum with dtype=int, since it applies it on chunks, the results were a little bit unexpected for me at first.

What numpy does is this:

dtype: The type of the returned array and of the accumulator in which the elements are summed.

ah, looks like we don’t test stats.sum(<array dtype=np.float*>, dtype=np.int*) properly. For other combinations we test that things behave like numpy does.

I guess that’s a bug then! #124

flying-sheep · 2025-10-14T11:34:03Z

OK, new info here:

it’s indeed a bug, fixed here: https://github.com/scverse/fast-array-utils/pull/125/files#diff-18e1a1011b1aae08dc9c4ffb113407aea86314e46935338a9e2d64bbb7a279f7L95-R101
scipy has the same bug: BUG: sparse: cs{cr}_{array,matrix}’s sum(axis=0|1) doesn’t interpret dtype as it should scipy/scipy#23768

so we’re also circumventing the bug for scipy, which makes the x.data.sum necessary for performance reasons.

quentinblampey · 2025-10-14T11:38:45Z

Thanks for the in-depth analysis @flying-sheep, and good catch for the scipy bug!

flying-sheep · 2025-10-14T11:41:15Z

once the other PR is merged, I think we can progress here!

Except for the overloads: I still don’t have a good idea on how to get them both into the docs and the typing without repeating ourselves.

flying-sheep · 2025-10-15T16:29:03Z

OK, I think I merged it all in, now we just need to fix mypy and the docs and this should be ready!

quentinblampey · 2025-10-27T16:25:55Z

Hi @flying-sheep, I'm still not sure what the best way is to handle the docs
Do you want to handle it yourself, or do you want me to try finding a solution?

flying-sheep · 2025-10-28T08:13:07Z

I’ll tackle it, thank you!

quentinblampey · 2025-10-28T08:23:10Z

Thanks for your help @flying-sheep!

quentinblampey · 2025-10-28T11:10:36Z

Awesome!🎉

flying-sheep · 2025-10-28T11:23:06Z

thank you!

add min max

2833ad4

rename ops -> op

ddc450b

deduplicate and test

15347f7

flying-sheep and others added 2 commits September 9, 2025 11:54

Merge branch 'main' into minmax

c90eeec

deduplication using generic function for sum/max/min

fb6e14e

Merge branch 'main' into pr/quentinblampey/116

7b19851

flying-sheep added 2 commits October 13, 2025 11:04

correct statfun

f99bcd3

fix sparse tests

dc43d44

flying-sheep mentioned this pull request Oct 13, 2025

fix: Have sum operate on the passed dtype. #125

Merged

Merge branch 'main' into pr/quentinblampey/116

ece4510

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Oct 15, 2025

fix performance

51e8e3c

Merge branch 'main' into pr/quentinblampey/116

2aa8bd8

quentinblampey mentioned this pull request Oct 20, 2025

Support for anndata view (backed='r') MICS-Lab/novae#28

Closed

Merge branch 'main' into minmax

70c7fb7

flying-sheep added 3 commits October 28, 2025 10:00

mypy is useless

05aac3b

oops

eb06f9b

simplify

17e3b7c

flying-sheep added 2 commits October 28, 2025 10:54

oops2

e35a436

coverage

9a30eb1

flying-sheep changed the title ~~Add min max~~ feat: add min and max Oct 28, 2025

flying-sheep merged commit 951282b into scverse:main Oct 28, 2025
20 checks passed

Conversation

quentinblampey commented Aug 27, 2025

Uh oh!

codecov bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #116 will not alter performance

Summary

Benchmarks breakdown

Uh oh!

flying-sheep commented Aug 28, 2025

Uh oh!

quentinblampey commented Oct 6, 2025 • edited by flying-sheep Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Oct 14, 2025

Uh oh!

quentinblampey commented Oct 14, 2025

Uh oh!

flying-sheep commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Oct 15, 2025

Uh oh!

quentinblampey commented Oct 27, 2025

Uh oh!

flying-sheep commented Oct 28, 2025

Uh oh!

quentinblampey commented Oct 28, 2025

Uh oh!

Uh oh!

quentinblampey commented Oct 28, 2025

Uh oh!

flying-sheep commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Aug 27, 2025 •

edited

Loading

codspeed-hq bot commented Aug 27, 2025 •

edited

Loading

quentinblampey commented Oct 6, 2025 •

edited by flying-sheep

Loading

flying-sheep commented Oct 13, 2025 •

edited

Loading

flying-sheep commented Oct 14, 2025 •

edited

Loading

flying-sheep commented Oct 28, 2025 •

edited

Loading