Skip to content

feat: add min and max#116

Merged
flying-sheep merged 17 commits intoscverse:mainfrom
quentinblampey:minmax
Oct 28, 2025
Merged

feat: add min and max#116
flying-sheep merged 17 commits intoscverse:mainfrom
quentinblampey:minmax

Conversation

@quentinblampey
Copy link
Contributor

Hello @flying-sheep @Zethson,

As mentioned in #112, I added support for min/max, but I'm unsure about my implementation, notably regarding code quality:

  • The code is highly similar to the sum function, but it has some minor differences due to the fact that the min/max operations don't use the dtype argument. I think I could potentially make a function that is generic enough to handle all these "simple" operations at once, but I'm worried it would become too abstract and complex to read/maintain. What do you think?
  • Two functions (normalize_axis and get_shape) are shared for sum and min/max, so I wanted to check with you where we should move them. E.g., I can create stats/_utils.py?
  • The docs and tests are still missing, but I want to first check with you the above point and then I'll add these

Closes #112

@codecov
Copy link

codecov bot commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.13%. Comparing base (040a3cd) to head (9a30eb1).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #116      +/-   ##
==========================================
+ Coverage   98.33%   99.13%   +0.80%     
==========================================
  Files          17       19       +2     
  Lines         421      464      +43     
==========================================
+ Hits          414      460      +46     
+ Misses          7        4       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Aug 27, 2025

CodSpeed Performance Report

Merging #116 will not alter performance

Comparing quentinblampey:minmax (9a30eb1) with main (040a3cd)

Summary

✅ 160 untouched
🆕 72 new

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 test_stats_benchmark[numpy.ndarray-1d-all-float32-max] N/A 2.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-1d-all-float32-min] N/A 2.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-1d-all-float64-max] N/A 4.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-1d-all-float64-min] N/A 4.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-1d-all-int32-max] N/A 1.6 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-1d-all-int32-min] N/A 1.6 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-all-float32-max] N/A 2.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-all-float32-min] N/A 2.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-all-float64-max] N/A 4.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-all-float64-min] N/A 4.1 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-all-int32-max] N/A 1.6 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-all-int32-min] N/A 1.6 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax0-float32-max] N/A 2.3 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax0-float32-min] N/A 2.3 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax0-float64-max] N/A 4.5 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax0-float64-min] N/A 4.5 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax0-int32-max] N/A 2.2 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax0-int32-min] N/A 2.2 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax1-float32-max] N/A 2.2 ms N/A
🆕 test_stats_benchmark[numpy.ndarray-2d-ax1-float32-min] N/A 2.2 ms N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

@flying-sheep
Copy link
Member

Hi! Thanks for the PR!

I did some of the deduplication, but all function bodies are identical except for the existence/non-existence of dtype and the actual numpy function that’s being called. I’d therefore implement similar to here:

https://github.com/quentinblampey/fast-array-utils/blob/15347f7c1d741de4e867790c430b8c7058f049d9/src/fast_array_utils/stats/__init__.py#L273-L291

but I can do that later.

@quentinblampey
Copy link
Contributor Author

quentinblampey commented Oct 6, 2025

Hi @flying-sheep,
Sorry for the delay, I finished the deduplication to handle the dtype correctly

A few comments:

  • I saw that for the sum on sparse arrays, you were using x.data.sum instead of x.sum. Is there a reason for this? I'm assuming scipy already use x.data internally for the sum, but I wanted to check with you because the new implementation uses x directly (else we would have issues, e.g. for the min when x.data only has positive values wouldn't return 0).
  • How to document these new functions now that we have made a generic builder? Directly updating .__doc__?
  • For the sum with dtype=int, since it applies it on chunks, the results were a little bit unexpected for me at first. I understand what it does and why we have this result, but it's a little bit unintuitive, because it's not consistent with direct dask usage and not consistent with rounding post float-sum either.
>>> import dask.array as da
>>> from fast_array_utils import stats
>>> x = da.random.random((1000, 1000), chunks=(100, 100))
>>> stats.sum(x).compute(), stats.sum(x, dtype=int).compute()

(np.float64(500227.801967892), np.int64(500173)) # large difference

>>> x.sum(dtype=int).compute()
0 # convert to int before summing

@flying-sheep
Copy link
Member

flying-sheep commented Oct 13, 2025

Hi! I’m also responding late, since I was on a hackathon last week!

  • I saw that for the sum on sparse arrays, you were using x.data.sum instead of x.sum. Is there a reason for this?

Probably not, I assume that x.sum will do the right thing.

  • How to document these new functions now that we have made a generic builder?

maybe no better way than to go back to manual @overloads (maybe in a if TYPE_CHECKING block)

  • For the sum with dtype=int, since it applies it on chunks, the results were a little bit unexpected for me at first.

What numpy does is this:

dtype: The type of the returned array and of the accumulator in which the elements are summed.

ah, looks like we don’t test stats.sum(<array dtype=np.float*>, dtype=np.int*) properly. For other combinations we test that things behave like numpy does.

I guess that’s a bug then! #124

@flying-sheep
Copy link
Member

OK, new info here:

so we’re also circumventing the bug for scipy, which makes the x.data.sum necessary for performance reasons.

@quentinblampey
Copy link
Contributor Author

Thanks for the in-depth analysis @flying-sheep, and good catch for the scipy bug!

@flying-sheep
Copy link
Member

flying-sheep commented Oct 14, 2025

once the other PR is merged, I think we can progress here!

Except for the overloads: I still don’t have a good idea on how to get them both into the docs and the typing without repeating ourselves.

@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Oct 15, 2025
@flying-sheep
Copy link
Member

OK, I think I merged it all in, now we just need to fix mypy and the docs and this should be ready!

@quentinblampey
Copy link
Contributor Author

Hi @flying-sheep, I'm still not sure what the best way is to handle the docs
Do you want to handle it yourself, or do you want me to try finding a solution?

@flying-sheep
Copy link
Member

I’ll tackle it, thank you!

@quentinblampey
Copy link
Contributor Author

Thanks for your help @flying-sheep!

@flying-sheep flying-sheep changed the title Add min max feat: add min and max Oct 28, 2025
@flying-sheep flying-sheep merged commit 951282b into scverse:main Oct 28, 2025
20 checks passed
@quentinblampey
Copy link
Contributor Author

Awesome!🎉

@flying-sheep
Copy link
Member

flying-sheep commented Oct 28, 2025

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-gpu-ci Apply this label to run GPU CI once

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support min/max

2 participants