Skip to content

max_memory_padding not forwarded to BinningAutoBatcher during optimizer init in optimize() #514

@niklashoelter

Description

@niklashoelter

Description

When passing an InFlightAutoBatcher with a custom max_memory_padding to optimize(), the padding value does not appear to be forwarded to the internal _chunked_apply() call used for optimizer initialization (e.g. FIRE init). This causes the BinningAutoBatcher created inside _chunked_apply() to default to max_memory_padding=1.0, effectively using no safety margin during memory estimation for the init phase.

Observed behavior

  • OOM errors during FIRE initialization on large workloads (~4000 structures, 24 GB GPU) even with a conservative max_memory_padding (e.g. 0.5)
  • Changing max_memory_padding on the InFlightAutoBatcher has no effect on the outcome
  • The same workload succeeds with smaller structure counts (~100)

Suspected cause

In runners.py, optimize() forwards max_memory_scaler, memory_scales_with, max_atoms_to_try, and oom_error_message to _chunked_apply() — but not max_memory_padding. The BinningAutoBatcher created inside _chunked_apply() then defaults to max_memory_padding=1.0 (no headroom), and the memory estimator determines batch sizes that fill 100% of GPU memory based on a bare forward pass. When fire_init() then allocates additional optimizer state (velocities, dt, alpha, etc.) on top, it exceeds GPU memory.

Fix

Addressed in #513.

Environment

  • torch-sim-atomistic==0.5.2
  • torch==2.8.0+cu128
  • fairchem-core==2.18.0
  • GPU: NVIDIA Quadro RTX 6000 (24 GB)
  • Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions