Skip to content

Fix C-heap memory leaks in solver state cleanup#2981

Merged
jajhall merged 1 commit intoERGO-Code:fix-2981from
mstumberger:fix/c-heap-memory-leaks-in-solver-cleanup
Apr 20, 2026
Merged

Fix C-heap memory leaks in solver state cleanup#2981
jajhall merged 1 commit intoERGO-Code:fix-2981from
mstumberger:fix/c-heap-memory-leaks-in-solver-cleanup

Conversation

@mstumberger
Copy link
Copy Markdown

Problem

HiGHS leaks C-heap memory when a Highs instance is reused across multiple solves. Python GC objects show zero growth but RSS climbs monotonically. Benchmarks on a long-running service showed:
┌───────────┬────────────────┬───────────────┬──────────┐
│ Allocator │ Avg growth/job │ After 10 jobs │ Peak RSS │
├───────────┼────────────────┼───────────────┼──────────┤
│ glibc │ +10.13 MB │ +101.3 MB │ 952 MB │
├───────────┼────────────────┼───────────────┼──────────┤
│ jemalloc │ +2.74 MB │ +27.4 MB │ 901 MB │
└───────────┴────────────────┴───────────────┴──────────┘
Root causes

  1. saved_objective_and_solution_ never cleared — clearModel() clears model_ and multi_linear_objective_ but skips this vector of MIP solution snapshots. It accumulates indefinitely across solves and is only freed by the destructor.
  2. invalidateSolution() / invalidateBasis() retain vectors — Both methods only flip boolean flags (value_valid = false, valid = false) but leave the underlying std::vectors (col_value, col_dual, row_value, row_dual, col_status, row_status) allocated. The actual .clear() methods that free the vectors exist but are never called.
  3. HEkk::invalidate() retains all simplex memory — clearSolver() calls ekk_instance_.invalidate() which only sets three status flags. The actual HEkk::clear() method (which frees LP data, working arrays, dual edge weights, factorization data, and NLA state) is never invoked during normal operation. All simplex working arrays from previous solves persist for the lifetime of the Highs instance.
  4. std::vector capacity retention — Even when vectors are .clear()ed, the allocated capacity is retained. With glibc's allocator these pages are never returned to the OS, causing heap fragmentation.

Changes

highs/lp_data/Highs.cpp:

  • clearModel(): Added saved_objective_and_solution_.clear() to free accumulated MIP solutions between model changes.
  • clearSolver(): Added ekk_instance_.clear() to free all simplex working arrays (LP data, dual edge weights, factorization, NLA) instead of just invalidating status flags.
  • invalidateSolution(): Changed solution_.invalidate() to solution_.clear() to actually free the four solution vectors.
  • invalidateBasis(): Changed basis_.invalidate() to basis_.clear() to actually free the two basis status vectors.
  • releaseMemory(): New public method that calls clearModel() then shrink_to_fit() on all retained vectors (solution, basis, ranging) to return unused capacity to the allocator.

highs/Highs.h: Declared Highs::releaseMemory() with documentation.

highs/highs_bindings.cpp: Exposed releaseMemory to Python via pybind11.

highs/highspy/_core/init.pyi: Added type stub for releaseMemory().

Impact

  • clearSolver() now properly frees all solver memory between solves, eliminating the primary source of RSS growth.
  • releaseMemory() provides an explicit API for long-running services to return memory to the OS after a solve, addressing heap fragmentation from vector capacity retention.
  • The invalidate() → clear() changes in invalidateSolution() and invalidateBasis() ensure solution and basis vectors are freed immediately rather than retained indefinitely.
  • saved_objective_and_solution_ is now properly cleaned up in clearModel(), preventing unbounded accumulation of MIP solution snapshots.

Testing

  • All existing CMake tests pass (2/2).
  • Full build succeeds with zero warnings on the changed files.
  • The changes are additive — clear() subsumes invalidate() (it calls invalidate() internally) so no behavioral change beyond memory being freed.

  Problem

  HiGHS leaks C-heap memory when a Highs instance is reused across multiple solves. Python GC objects show zero growth but RSS climbs monotonically. Benchmarks on a long-running service showed:
  ┌───────────┬────────────────┬───────────────┬──────────┐
  │ Allocator │ Avg growth/job │ After 10 jobs │ Peak RSS │
  ├───────────┼────────────────┼───────────────┼──────────┤
  │ glibc     │ +10.13 MB      │ +101.3 MB     │ 952 MB   │
  ├───────────┼────────────────┼───────────────┼──────────┤
  │ jemalloc  │ +2.74 MB       │ +27.4 MB      │ 901 MB   │
  └───────────┴────────────────┴───────────────┴──────────┘
  Root causes

  1. saved_objective_and_solution_ never cleared — clearModel() clears model_ and multi_linear_objective_ but skips this vector of MIP solution snapshots. It accumulates indefinitely across solves and is only freed by the destructor.
  2. invalidateSolution() / invalidateBasis() retain vectors — Both methods only flip boolean flags (value_valid = false, valid = false) but leave the underlying std::vectors (col_value, col_dual, row_value, row_dual, col_status, row_status) allocated. The actual .clear() methods that free the vectors exist but are never called.
  3. HEkk::invalidate() retains all simplex memory — clearSolver() calls ekk_instance_.invalidate() which only sets three status flags. The actual HEkk::clear() method (which frees LP data, working arrays, dual edge weights, factorization data, and NLA state) is never invoked during normal operation. All simplex working arrays from previous solves persist for the lifetime of the Highs instance.
  4. std::vector capacity retention — Even when vectors are .clear()ed, the allocated capacity is retained. With glibc's allocator these pages are never returned to the OS, causing heap fragmentation.

  Changes

  highs/lp_data/Highs.cpp:

  - clearModel(): Added saved_objective_and_solution_.clear() to free accumulated MIP solutions between model changes.
  - clearSolver(): Added ekk_instance_.clear() to free all simplex working arrays (LP data, dual edge weights, factorization, NLA) instead of just invalidating status flags.
  - invalidateSolution(): Changed solution_.invalidate() to solution_.clear() to actually free the four solution vectors.
  - invalidateBasis(): Changed basis_.invalidate() to basis_.clear() to actually free the two basis status vectors.
  - releaseMemory(): New public method that calls clearModel() then shrink_to_fit() on all retained vectors (solution, basis, ranging) to return unused capacity to the allocator.

  highs/Highs.h: Declared Highs::releaseMemory() with documentation.

  highs/highs_bindings.cpp: Exposed releaseMemory to Python via pybind11.

  highs/highspy/_core/__init__.pyi: Added type stub for releaseMemory().

  Impact

  - clearSolver() now properly frees all solver memory between solves, eliminating the primary source of RSS growth.
  - releaseMemory() provides an explicit API for long-running services to return memory to the OS after a solve, addressing heap fragmentation from vector capacity retention.
  - The invalidate() → clear() changes in invalidateSolution() and invalidateBasis() ensure solution and basis vectors are freed immediately rather than retained indefinitely.
  - saved_objective_and_solution_ is now properly cleaned up in clearModel(), preventing unbounded accumulation of MIP solution snapshots.

  Testing

  - All existing CMake tests pass (2/2).
  - Full build succeeds with zero warnings on the changed files.
  - The changes are additive — clear() subsumes invalidate() (it calls invalidate() internally) so no behavioral change beyond memory being freed.
@jajhall jajhall changed the base branch from master to latest April 20, 2026 13:23
@jajhall
Copy link
Copy Markdown
Member

jajhall commented Apr 20, 2026

Thanks for this. I was unaware that vector::clear didn't free the capacity

I'll have to check that no useful simplex data is lost.

We merge to latest, keeping master just for releases, so I changed base branch

@mstumberger
Copy link
Copy Markdown
Author

Thanks for looking at this.

To clarify — the main fixes are about calling clear() where only invalidate() was called before (so vectors were never freed at all), not about vector::clear() capacity behavior.

The invalidate() → clear() changes in invalidateSolution() and invalidateBasis() are safe — clear() calls invalidate() internally first, so the semantic flags are still set.
The difference is just that the underlying vectors are now freed too.

The ekk_instance_.clear() in clearSolver() is the one worth reviewing carefully for warm-start impact.
If that's a concern, it could be gated on !options_.use_warm_start (same condition that already guards clearSolver() in optimizeModel()).
Happy to adjust.

The releaseMemory() method is the only new addition — it's opt-in and specifically for long-running services that need to return memory to the OS between solves.

@jajhall jajhall self-requested a review April 20, 2026 16:38
@jajhall jajhall self-assigned this Apr 20, 2026
@jajhall jajhall changed the base branch from latest to fix-2981 April 20, 2026 16:50
@jajhall
Copy link
Copy Markdown
Member

jajhall commented Apr 20, 2026

There are some non-trivial CI failures, so base branch is now fix-2981 so that they can be investigated and fixed

@filikat
Copy link
Copy Markdown
Collaborator

filikat commented Apr 20, 2026

The issue seems to be the change from solution_.invalidate(); to solution_.clear(); in Highs::invalidateSolution and the change from basis_.invalidate(); to basis_.clear(); in Highs::invalidateBasis. This probably breaks some parts of the code that relies on the size of vectors in solution_ and basis_ to do some operations. Maybe calling solution_.clear() and basis_.clear() within releaseMemory could fix the issue.

@jajhall
Copy link
Copy Markdown
Member

jajhall commented Apr 20, 2026

I'll merge this and look at the CI failures

@jajhall jajhall merged commit ec36097 into ERGO-Code:fix-2981 Apr 20, 2026
190 of 250 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants