fix(models): drop persistent energies buffer in Ewald + PME by Ryan-Reese · Pull Request #82 · NVIDIA/nvalchemi-toolkit

Ryan-Reese · 2026-04-24T01:42:46Z

ALCHEMI Toolkit Pull Request

Description

EwaldModelWrapper and PMEModelWrapper kept a persistent self._energies_buf tensor and reused it each forward via zero_() + scatter_add_(0, batch_idx, per_atom_energies). Because per_atom_energies carries a grad_fn from the Warp backward tape, the in-place ops chain each step's autograd graph onto the buffer through PyTorch's version-counter mechanism. The buffer is owned by self, so the chain cannot be GC'd — per-step cost grows linearly and GPU memory climbs unboundedly in long MD runs. This PR drops the persistent buffer and allocates a fresh one per forward.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Performance improvement
Documentation update
Refactoring (no functional changes)
CI/CD or infrastructure change

Related Issues

N/A — bug surfaced during downstream MD integration work, not tracked in an existing issue.

Changes Made

nvalchemi/models/ewald.py, nvalchemi/models/pme.py: drop self._energies_buf; allocate a fresh (B,) zero buffer per forward(), scatter into it, clone on return so callers continue to receive a tensor with independent storage (matches the prior output contract; some downstream consumers mutate batch.energy in place).
test/models/test_ewald.py, test/models/test_pme.py: add two regression tests per wrapper — one asserts _energies_buf is absent after a forward, one asserts consecutive forwards return energy tensors whose storage is independent of prior outputs.
CHANGELOG.md: add entry under Unreleased.

Testing

Unit tests pass locally (make pytest — test/models/test_ewald.py test/models/test_pme.py = 134 passed, 130 existing + 4 new)
Linting passes (make lint)
New tests added for new functionality meets coverage expectations?

Empirical verification

1000 NVT Langevin steps, naphthalene (3,6,3) = 1944 atoms, PBC, AIMNet2 + electrostatics (accuracy=1e-6, hybrid_forces=False, compile_model=True), dt = 1.5 fs, T = 200 K, friction = 0.01, 50-step timing blocks.

Tested on: NVIDIA H100 NVL (95 GB, driver 570.195.03), ARM (Linux 6.8.0), PyTorch 2.11.0+cu130 (CUDA 13.0), Python 3.12.13.

	step 50 ms/step	step 950 ms/step	last / first	GPU mem start	GPU mem end
Ewald (unfixed)	146.5	456.9	3.1×	175.8 MB	196.5 MB
Ewald (fixed)	94.4	79.7	0.84 (warmup)	174.7 MB	174.7 MB
PME (unfixed)	111.8	568.6	5.1×	183.9 MB	204.6 MB
PME (fixed)	91.6	76.0	0.83 (warmup)	182.8 MB	182.8 MB

Figure (per-step runtime + GPU memory vs step, all four configs) attached below.

Numerical equivalence. GPU scatter_add_ is non-deterministic (atomic-add ordering), so neither variant is bit-exact against itself. Measured relative L2 deltas of per-atom positions and forces at step 20 (seed = 42): unfixed-vs-fixed deltas are within same-code run-to-run envelope for both models. No systematic shift.

Checklist

I have read and understand the Contributing Guidelines
I have updated the CHANGELOG.md
I have performed a self-review of my code
I have added docstrings to new functions/classes
I have updated the documentation (if applicable)

Additional Notes

nvalchemi/models/lj.py has a superficially similar pattern but with self._atomic_energies_buf (a kernel output buffer) on the scatter RHS — the autograd story is different there. Out of scope for this PR; flagged as a separate investigation / follow-up if it turns out to leak.

EwaldModelWrapper and PMEModelWrapper kept a persistent `self._energies_buf` and reused it each forward via `zero_()` + `scatter_add_`. Because `per_atom_energies` carries a `grad_fn` from the Warp backward tape, the in-place ops chain each step's autograd graph onto the buffer via PyTorch's version-counter mechanism, causing linear per-step slowdown and unbounded GPU-memory growth in long MD runs. Allocate a fresh (B,) buffer per forward and clone on return so callers continue to receive a tensor with independent storage (matches the prior output contract; some downstream consumers mutate batch.energy in place). Adds two regression tests per wrapper: one asserting `_energies_buf` is absent after a forward, one asserting successive forwards return energy tensors whose storage is independent of prior outputs.

copy-pr-bot · 2026-04-24T01:42:50Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps · 2026-04-24T01:56:02Z

Greptile Summary

This PR removes the persistent self._energies_buf tensor from EwaldModelWrapper and PMEModelWrapper, replacing the reused-buffer pattern with a fresh torch.zeros allocation each forward(). The root cause — in-place scatter_add_ of a gradient-carrying tensor into a module-owned buffer growing the autograd graph linearly across MD steps — is correctly diagnosed and the fix is minimal and precise. The .clone() on return is still warranted to protect callers that mutate batch.energy in-place, and the local energies variable goes out of scope cleanly after each call, allowing immediate GC.

Important Files Changed

Filename	Overview
nvalchemi/models/ewald.py	Drops `self._energies_buf`; allocates `torch.zeros(B, ...)` locally each forward and clones before returning — correctly breaks the version-counter chain without changing output contract.
nvalchemi/models/pme.py	Identical fix to ewald.py — persistent `_energies_buf` removed, fresh buffer allocated per forward, clone on return preserved.
test/models/test_ewald.py	Adds two regression tests: absence of `_energies_buf` attribute post-forward, and independence of consecutive energy tensor outputs.
test/models/test_pme.py	Same pair of regression tests as test_ewald.py, mirrored for PMEModelWrapper.
CHANGELOG.md	Adds Unreleased section with an accurate description of the buffer-drop fix for both Ewald and PME wrappers.

_{Reviews (1): Last reviewed commit: "fix(models): drop persistent energies bu..." | Re-trigger Greptile}

Ryan-Reese marked this pull request as ready for review April 24, 2026 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): drop persistent energies buffer in Ewald + PME#82

fix(models): drop persistent energies buffer in Ewald + PME#82
Ryan-Reese wants to merge 1 commit intoNVIDIA:mainfrom
Ryan-Reese:fix/ewald-pme-energies-buf-leak

Ryan-Reese commented Apr 24, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ryan-Reese commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ALCHEMI Toolkit Pull Request

Description

Type of Change

Related Issues

Changes Made

Testing

Empirical verification

Checklist

Additional Notes

Uh oh!

copy-pr-bot Bot commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

Important Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ryan-Reese commented Apr 24, 2026 •

edited

Loading