Description
When passing an InFlightAutoBatcher with a custom max_memory_padding to optimize(), the padding value does not appear to be forwarded to the internal _chunked_apply() call used for optimizer initialization (e.g. FIRE init). This causes the BinningAutoBatcher created inside _chunked_apply() to default to max_memory_padding=1.0, effectively using no safety margin during memory estimation for the init phase.
Observed behavior
- OOM errors during FIRE initialization on large workloads (~4000 structures, 24 GB GPU) even with a conservative
max_memory_padding (e.g. 0.5)
- Changing
max_memory_padding on the InFlightAutoBatcher has no effect on the outcome
- The same workload succeeds with smaller structure counts (~100)
Suspected cause
In runners.py, optimize() forwards max_memory_scaler, memory_scales_with, max_atoms_to_try, and oom_error_message to _chunked_apply() — but not max_memory_padding. The BinningAutoBatcher created inside _chunked_apply() then defaults to max_memory_padding=1.0 (no headroom), and the memory estimator determines batch sizes that fill 100% of GPU memory based on a bare forward pass. When fire_init() then allocates additional optimizer state (velocities, dt, alpha, etc.) on top, it exceeds GPU memory.
Fix
Addressed in #513.
Environment
torch-sim-atomistic==0.5.2
torch==2.8.0+cu128
fairchem-core==2.18.0
- GPU: NVIDIA Quadro RTX 6000 (24 GB)
- Python 3.12
Description
When passing an
InFlightAutoBatcherwith a custommax_memory_paddingtooptimize(), the padding value does not appear to be forwarded to the internal_chunked_apply()call used for optimizer initialization (e.g. FIRE init). This causes theBinningAutoBatchercreated inside_chunked_apply()to default tomax_memory_padding=1.0, effectively using no safety margin during memory estimation for the init phase.Observed behavior
max_memory_padding(e.g.0.5)max_memory_paddingon theInFlightAutoBatcherhas no effect on the outcomeSuspected cause
In
runners.py,optimize()forwardsmax_memory_scaler,memory_scales_with,max_atoms_to_try, andoom_error_messageto_chunked_apply()— but notmax_memory_padding. TheBinningAutoBatchercreated inside_chunked_apply()then defaults tomax_memory_padding=1.0(no headroom), and the memory estimator determines batch sizes that fill 100% of GPU memory based on a bare forward pass. Whenfire_init()then allocates additional optimizer state (velocities, dt, alpha, etc.) on top, it exceeds GPU memory.Fix
Addressed in #513.
Environment
torch-sim-atomistic==0.5.2torch==2.8.0+cu128fairchem-core==2.18.0