Fix C-heap memory leaks in solver state cleanup#2981
Conversation
Problem HiGHS leaks C-heap memory when a Highs instance is reused across multiple solves. Python GC objects show zero growth but RSS climbs monotonically. Benchmarks on a long-running service showed: ┌───────────┬────────────────┬───────────────┬──────────┐ │ Allocator │ Avg growth/job │ After 10 jobs │ Peak RSS │ ├───────────┼────────────────┼───────────────┼──────────┤ │ glibc │ +10.13 MB │ +101.3 MB │ 952 MB │ ├───────────┼────────────────┼───────────────┼──────────┤ │ jemalloc │ +2.74 MB │ +27.4 MB │ 901 MB │ └───────────┴────────────────┴───────────────┴──────────┘ Root causes 1. saved_objective_and_solution_ never cleared — clearModel() clears model_ and multi_linear_objective_ but skips this vector of MIP solution snapshots. It accumulates indefinitely across solves and is only freed by the destructor. 2. invalidateSolution() / invalidateBasis() retain vectors — Both methods only flip boolean flags (value_valid = false, valid = false) but leave the underlying std::vectors (col_value, col_dual, row_value, row_dual, col_status, row_status) allocated. The actual .clear() methods that free the vectors exist but are never called. 3. HEkk::invalidate() retains all simplex memory — clearSolver() calls ekk_instance_.invalidate() which only sets three status flags. The actual HEkk::clear() method (which frees LP data, working arrays, dual edge weights, factorization data, and NLA state) is never invoked during normal operation. All simplex working arrays from previous solves persist for the lifetime of the Highs instance. 4. std::vector capacity retention — Even when vectors are .clear()ed, the allocated capacity is retained. With glibc's allocator these pages are never returned to the OS, causing heap fragmentation. Changes highs/lp_data/Highs.cpp: - clearModel(): Added saved_objective_and_solution_.clear() to free accumulated MIP solutions between model changes. - clearSolver(): Added ekk_instance_.clear() to free all simplex working arrays (LP data, dual edge weights, factorization, NLA) instead of just invalidating status flags. - invalidateSolution(): Changed solution_.invalidate() to solution_.clear() to actually free the four solution vectors. - invalidateBasis(): Changed basis_.invalidate() to basis_.clear() to actually free the two basis status vectors. - releaseMemory(): New public method that calls clearModel() then shrink_to_fit() on all retained vectors (solution, basis, ranging) to return unused capacity to the allocator. highs/Highs.h: Declared Highs::releaseMemory() with documentation. highs/highs_bindings.cpp: Exposed releaseMemory to Python via pybind11. highs/highspy/_core/__init__.pyi: Added type stub for releaseMemory(). Impact - clearSolver() now properly frees all solver memory between solves, eliminating the primary source of RSS growth. - releaseMemory() provides an explicit API for long-running services to return memory to the OS after a solve, addressing heap fragmentation from vector capacity retention. - The invalidate() → clear() changes in invalidateSolution() and invalidateBasis() ensure solution and basis vectors are freed immediately rather than retained indefinitely. - saved_objective_and_solution_ is now properly cleaned up in clearModel(), preventing unbounded accumulation of MIP solution snapshots. Testing - All existing CMake tests pass (2/2). - Full build succeeds with zero warnings on the changed files. - The changes are additive — clear() subsumes invalidate() (it calls invalidate() internally) so no behavioral change beyond memory being freed.
|
Thanks for this. I was unaware that vector::clear didn't free the capacity I'll have to check that no useful simplex data is lost. We merge to latest, keeping master just for releases, so I changed base branch |
|
Thanks for looking at this. To clarify — the main fixes are about calling clear() where only invalidate() was called before (so vectors were never freed at all), not about vector::clear() capacity behavior. The invalidate() → clear() changes in invalidateSolution() and invalidateBasis() are safe — clear() calls invalidate() internally first, so the semantic flags are still set. The ekk_instance_.clear() in clearSolver() is the one worth reviewing carefully for warm-start impact. The releaseMemory() method is the only new addition — it's opt-in and specifically for long-running services that need to return memory to the OS between solves. |
|
There are some non-trivial CI failures, so base branch is now fix-2981 so that they can be investigated and fixed |
|
The issue seems to be the change from |
|
I'll merge this and look at the CI failures |
Problem
HiGHS leaks C-heap memory when a Highs instance is reused across multiple solves. Python GC objects show zero growth but RSS climbs monotonically. Benchmarks on a long-running service showed:
┌───────────┬────────────────┬───────────────┬──────────┐
│ Allocator │ Avg growth/job │ After 10 jobs │ Peak RSS │
├───────────┼────────────────┼───────────────┼──────────┤
│ glibc │ +10.13 MB │ +101.3 MB │ 952 MB │
├───────────┼────────────────┼───────────────┼──────────┤
│ jemalloc │ +2.74 MB │ +27.4 MB │ 901 MB │
└───────────┴────────────────┴───────────────┴──────────┘
Root causes
Changes
highs/lp_data/Highs.cpp:
highs/Highs.h: Declared Highs::releaseMemory() with documentation.
highs/highs_bindings.cpp: Exposed releaseMemory to Python via pybind11.
highs/highspy/_core/init.pyi: Added type stub for releaseMemory().
Impact
Testing