Skip to content

ENH: add maxlag and lags parameters to correlate/convolve#6

Open
honi-at-simspace wants to merge 14 commits intomasterfrom
maxlag2025
Open

ENH: add maxlag and lags parameters to correlate/convolve#6
honi-at-simspace wants to merge 14 commits intomasterfrom
maxlag2025

Conversation

@honi-at-simspace
Copy link
Copy Markdown
Collaborator

@honi-at-simspace honi-at-simspace commented Apr 16, 2026

PR summary

This PR adds maxlag and lags keyword parameters to np.correlate and np.convolve, plus a returns_lagvector flag that returns the corresponding lag indices alongside the result. It also adds a new public C API function PyArray_CorrelateLags.

What was troubling me is that numpy.correlate does not have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series).

This PR revives numpy#5978, which I opened back in 2015 and never finished pushing through. I had introduced this question as issues on numpy and scipy: numpy#5954, scipy/scipy#4940 and on the scipy-dev list. It also got attention on stackoverflow at the time.

Apologies for the very long downtime. I'm excited to be able to come back to this and finally get this much-needed feature included.

Proposed API

np.correlate(a, v, mode=..., *, maxlag=None, lags=None, returns_lagvector=False)
np.convolve (a, v, mode=..., *, maxlag=None, lags=None, returns_lagvector=False)

Parameter Design

  • maxlag=M (int): symmetric inclusive window [-M, M] (2M+1 lags). Matches MATLAB's xcorr(x, y, M) convention.
  • lags=: a range, a slice with explicit start/stop, or a 1-D integer array_like containing an arithmetic progression.

Both keyword-only, mutually exclusive. Default mode auto-resolves to 'lags' when either is supplied.

  • returns_lagvector=True → returns (result, lagvector).

Major file changes

  • Python (numpy/_core/numeric.py): new maxlag/lags/returns_lagvector kwargs on correlate and convolve; helper functions _lags_from_maxlag, _lags_from_lags, _lags_from_mode.
  • C (numpy/_core/src/multiarray/multiarraymodule.c): new PyArray_CorrelateLags public C API; _pyarray_correlate implements non-mode lags and is now the single normalization site (handles array swap, negative-step normalization, output reversal internally — callers pass any valid form).

Benchmark

For a user interested in a small number of lags around 0, there is a huge speedup:

size1 size2 full + slice maxlag=5 speedup
1000 100 47.2 µs 11.1 µs 4.3×
1000 1000 572 µs 20.2 µs 28×
100000 100 4.09 ms 14.0 µs 290×
100000 1000 60.0 ms 23.4 µs 2500×

(From benchmarks/benchmarks/bench_core.py::CorrConvLags, included in this PR.)

First time committer introduction

Hi! I have used NumPy for more than a decade now, both directly, and as a key supporter of the whole scientific python ecosystem (e.g. pandas, pytorch, etc). When I started working on this in 2015, I was only using NumPy for the first time, at the time to build simulations of biological neural networks that were more performant and open source than Matlab, which I had previously been using. This was a feature that would have helped me quite a bit at the time, as I was cross-correlating very long time series, but only needed a relatively small window of time lags around 0. I have wanted to come back to it because of the continued interest of others even though I myself no longer need this specific functionality.

AI Disclosure

Claude Code is what finally allowed me to bring this PR to completion.

I used Claude as a pair-programming assistant for:

  • chasing down bugs (in particular it helped me find a buffer overflow that the earlier PR attempt had introduced in _pyarray_correlate);
  • proposing a more intuitive argument list for the Python API that I approved;
  • refactoring the vector inversion logic so the C function would be the single normalization site;
  • proposing a more complete structure of the test cases (geometry/equivalence/dtype groupings) -- the expected values in the tests were computed by hand or by running and sub-slicing the pre-existing implementation.
  • writing some of the docstrings (Parameters/Returns wording)

I take full responsibility for every line in this PR.

Brings the maxlag2025 branch up to date with current numpy/numpy main
(843 commits since branch base).

Conflict resolution in numpy/_core/src/multiarray/multiarraymodule.c:
- Adopt upstream's _pyarray_correlate signature change from
  `int typenum` to `PyArray_Descr *typec` (PR numpy#30931).
- Keep our refactor: the function still takes (minlag, maxlag, lagstep)
  instead of (mode, *inverted), and handles array swap, negative-step
  normalization, and output reversal internally.
- Update PyArray_Correlate, PyArray_Correlate2, and PyArray_CorrelateLags
  to use upstream's PyArray_DTypeFromObject + NPY_DT_CALL_ensure_canonical
  pattern for type resolution, with proper Py_DECREF(typec) cleanup.
- Update the new array_correlatelags argument parser to use upstream's
  brace-syntax for npy_parse_arguments.

Add NPY_2_6_API_VERSION 0x00000016 and matching version-string clause
in numpy/_core/include/numpy/numpyconfig.h to support the bumped
C API version that registers PyArray_CorrelateLags at slot 369.

Tests: numpy/_core/tests/test_numeric.py: 512 passed, 1 skipped.
@honi-at-simspace honi-at-simspace changed the title Maxlag2025 ENH: add maxlag and lags parameters to correlate/convolve Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant