Skip to content

fix(models): drop persistent energies buffer in Ewald + PME#82

Open
Ryan-Reese wants to merge 1 commit intoNVIDIA:mainfrom
Ryan-Reese:fix/ewald-pme-energies-buf-leak
Open

fix(models): drop persistent energies buffer in Ewald + PME#82
Ryan-Reese wants to merge 1 commit intoNVIDIA:mainfrom
Ryan-Reese:fix/ewald-pme-energies-buf-leak

Conversation

@Ryan-Reese
Copy link
Copy Markdown
Contributor

@Ryan-Reese Ryan-Reese commented Apr 24, 2026

ALCHEMI Toolkit Pull Request

Description

EwaldModelWrapper and PMEModelWrapper kept a persistent self._energies_buf tensor and reused it each forward via zero_() + scatter_add_(0, batch_idx, per_atom_energies). Because per_atom_energies carries a grad_fn from the Warp backward tape, the in-place ops chain each step's autograd graph onto the buffer through PyTorch's version-counter mechanism. The buffer is owned by self, so the chain cannot be GC'd — per-step cost grows linearly and GPU memory climbs unboundedly in long MD runs. This PR drops the persistent buffer and allocates a fresh one per forward.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Performance improvement
  • Documentation update
  • Refactoring (no functional changes)
  • CI/CD or infrastructure change

Related Issues

N/A — bug surfaced during downstream MD integration work, not tracked in an existing issue.

Changes Made

  • nvalchemi/models/ewald.py, nvalchemi/models/pme.py: drop self._energies_buf; allocate a fresh (B,) zero buffer per forward(), scatter into it, clone on return so callers continue to receive a tensor with independent storage (matches the prior output contract; some downstream consumers mutate batch.energy in place).
  • test/models/test_ewald.py, test/models/test_pme.py: add two regression tests per wrapper — one asserts _energies_buf is absent after a forward, one asserts consecutive forwards return energy tensors whose storage is independent of prior outputs.
  • CHANGELOG.md: add entry under Unreleased.

Testing

  • Unit tests pass locally (make pytesttest/models/test_ewald.py test/models/test_pme.py = 134 passed, 130 existing + 4 new)
  • Linting passes (make lint)
  • New tests added for new functionality meets coverage expectations?

Empirical verification

per_step_runtime_and_gpu_mem

1000 NVT Langevin steps, naphthalene (3,6,3) = 1944 atoms, PBC, AIMNet2 + electrostatics (accuracy=1e-6, hybrid_forces=False, compile_model=True), dt = 1.5 fs, T = 200 K, friction = 0.01, 50-step timing blocks.

Tested on: NVIDIA H100 NVL (95 GB, driver 570.195.03), ARM (Linux 6.8.0), PyTorch 2.11.0+cu130 (CUDA 13.0), Python 3.12.13.

step 50 ms/step step 950 ms/step last / first GPU mem start GPU mem end
Ewald (unfixed) 146.5 456.9 3.1× 175.8 MB 196.5 MB
Ewald (fixed) 94.4 79.7 0.84 (warmup) 174.7 MB 174.7 MB
PME (unfixed) 111.8 568.6 5.1× 183.9 MB 204.6 MB
PME (fixed) 91.6 76.0 0.83 (warmup) 182.8 MB 182.8 MB

Figure (per-step runtime + GPU memory vs step, all four configs) attached below.

Numerical equivalence. GPU scatter_add_ is non-deterministic (atomic-add ordering), so neither variant is bit-exact against itself. Measured relative L2 deltas of per-atom positions and forces at step 20 (seed = 42): unfixed-vs-fixed deltas are within same-code run-to-run envelope for both models. No systematic shift.

Checklist

  • I have read and understand the Contributing Guidelines
  • I have updated the CHANGELOG.md
  • I have performed a self-review of my code
  • I have added docstrings to new functions/classes
  • I have updated the documentation (if applicable)

Additional Notes

nvalchemi/models/lj.py has a superficially similar pattern but with self._atomic_energies_buf (a kernel output buffer) on the scatter RHS — the autograd story is different there. Out of scope for this PR; flagged as a separate investigation / follow-up if it turns out to leak.

EwaldModelWrapper and PMEModelWrapper kept a persistent
`self._energies_buf` and reused it each forward via `zero_()`
+ `scatter_add_`. Because `per_atom_energies` carries a
`grad_fn` from the Warp backward tape, the in-place ops chain
each step's autograd graph onto the buffer via PyTorch's
version-counter mechanism, causing linear per-step slowdown
and unbounded GPU-memory growth in long MD runs.

Allocate a fresh (B,) buffer per forward and clone on return
so callers continue to receive a tensor with independent
storage (matches the prior output contract; some downstream
consumers mutate batch.energy in place). Adds two regression
tests per wrapper: one asserting `_energies_buf` is absent
after a forward, one asserting successive forwards return
energy tensors whose storage is independent of prior outputs.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Ryan-Reese Ryan-Reese marked this pull request as ready for review April 24, 2026 01:53
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR removes the persistent self._energies_buf tensor from EwaldModelWrapper and PMEModelWrapper, replacing the reused-buffer pattern with a fresh torch.zeros allocation each forward(). The root cause — in-place scatter_add_ of a gradient-carrying tensor into a module-owned buffer growing the autograd graph linearly across MD steps — is correctly diagnosed and the fix is minimal and precise. The .clone() on return is still warranted to protect callers that mutate batch.energy in-place, and the local energies variable goes out of scope cleanly after each call, allowing immediate GC.

Important Files Changed

Filename Overview
nvalchemi/models/ewald.py Drops self._energies_buf; allocates torch.zeros(B, ...) locally each forward and clones before returning — correctly breaks the version-counter chain without changing output contract.
nvalchemi/models/pme.py Identical fix to ewald.py — persistent _energies_buf removed, fresh buffer allocated per forward, clone on return preserved.
test/models/test_ewald.py Adds two regression tests: absence of _energies_buf attribute post-forward, and independence of consecutive energy tensor outputs.
test/models/test_pme.py Same pair of regression tests as test_ewald.py, mirrored for PMEModelWrapper.
CHANGELOG.md Adds Unreleased section with an accurate description of the buffer-drop fix for both Ewald and PME wrappers.

Reviews (1): Last reviewed commit: "fix(models): drop persistent energies bu..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant