Skip to content

Fix cross-process Solution unpickle by refreshing Symbol._id (#5444)#5480

Open
kamal20122012 wants to merge 6 commits intopybamm-team:mainfrom
kamal20122012:fix/pickle-all-first-states-5444
Open

Fix cross-process Solution unpickle by refreshing Symbol._id (#5444)#5480
kamal20122012 wants to merge 6 commits intopybamm-team:mainfrom
kamal20122012:fix/pickle-all-first-states-5444

Conversation

@kamal20122012
Copy link
Copy Markdown

Summary

  • Symbol._id is computed in set_id() via hash(...) of a tuple containing strings (self.name, etc.). Python's hash() of strings is randomised per process (PYTHONHASHSEED), so _id values cached in the pickling process are not equal to the values the same content produces in the unpickling process.
  • Containers keyed on Symbols — primarily Discretisation.y_slices, but also various processed-symbol caches — are rebuilt at unpickle time using the keys' (stale) _id. Any lookup that has to go through Discretisation._process_symbol for a fresh symbol therefore misses, and the user sees a ModelError: No key set for variable '…'.
  • model._variables_processed is populated on first access and does survive pickle, which is why touching a variable before pickling masks the bug, and why reloading in the same process never hits it.
  • Fix: Symbol.__setstate__ now refreshes _id after restoring state. Pickle calls __setstate__ depth-first on the object graph and finishes restoring dict keys/values before populating the dict, so any rebuilt container ends up keyed by current-process hashes that match fresh lookups.

Fixes #5444

Test plan

  • Reproduction script from the issue: cross-process save then load now succeeds; new_sol.all_first_states[0][\"Discharge capacity [A.h]\"] returns data without error.
  • New unit test tests/unit/test_expression_tree/test_symbol.py::TestSymbol::test_setstate_refreshes_id — pickles a Variable with a stuffed stale _id and asserts __setstate__ recomputes it. Verified to fail on pre-fix code.
  • New integration test tests/unit/test_solvers/test_solution.py::TestSolution::test_pickle_first_states_across_processes — runs save and load in separate Python subprocesses and asserts the variable access succeeds. Verified to fail on pre-fix code (the load subprocess raises ModelError and exits 1).
  • pytest tests/unit/test_solvers/ tests/unit/test_expression_tree/ — same baseline (601 passed, 21 pre-existing failures all from missing optional skfem / snapshot deps; identical numbers with and without the patch).
  • pre-commit run --files <changed files> — all hooks pass.

`Symbol._id` is computed via `set_id()` using Python's `hash(...)` over a
tuple that includes string fields like `self.name`. Python's hash of
strings is randomised per process via PYTHONHASHSEED, so a `_id` cached
in one process is not equal to the value the same content produces in
another process. When a `Solution` is pickled and reloaded in a fresh
Python process, dicts keyed on `Symbol`s — most notably
`Discretisation.y_slices` — end up indexed by stale hashes from the
pickling process. Lookups for any variable that is not already cached in
`model._variables_processed` (e.g. accessing a variable from
`all_first_states` for the first time after unpickle) raise a
`ModelError` from `Discretisation._process_symbol`.

Refresh `_id` from `Symbol.__setstate__` so unpickled Symbols always
carry a hash computed in the current process. Pickle calls
`__setstate__` on graph nodes depth-first (children before parents) and
fully restores key/value objects before the containing dict is filled,
so the rebuilt dicts end up keyed by current-process hashes that match
fresh lookups.

Adds:
* a unit test that pickles a Variable with a stuffed stale `_id` and
  asserts `__setstate__` recomputes it (single-process simulation of the
  cross-process bug);
* an integration test that runs the save and load steps in separate
  Python subprocesses and asserts that `all_first_states[0][...]`
  returns data without error.

Fixes pybamm-team#5444
@kamal20122012 kamal20122012 requested a review from a team as a code owner April 30, 2026 09:52
- Hoist `pickle`, `subprocess`, and `sys` imports to module top so the
  tests do not trigger Pylint C0415 (import-outside-toplevel).
- Use `monkeypatch.setattr(var, "_id", 12345)` instead of direct
  assignment, and `hash(loaded)` instead of reading `loaded._id`, so the
  test no longer triggers W0212 (protected-access). Behaviour is
  unchanged: the test still simulates a stale, cross-process `_id` and
  asserts that `Symbol.__setstate__` recomputes it.
Codacy runs Bandit, which flagged the pickle and subprocess usages in
the pybamm-team#5444 regression tests:

- B403/B301 on `import pickle` and `pickle.loads(pickled)` — the
  pickled bytes come from a `pickle.dumps` call two lines earlier in
  the same test, so deserialisation is trusted.
- B404 on `import subprocess`, B603 on the two `subprocess.run` calls —
  the executable is `sys.executable` and the arg is a literal Python
  source string built in-process; there is no untrusted input.

Add narrowly-scoped `# nosec` comments with codes and rationale so
Codacy stops flagging these without globally suppressing the rules.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.15%. Comparing base (19c2d57) to head (40aa15b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5480   +/-   ##
=======================================
  Coverage   98.15%   98.15%           
=======================================
  Files         338      338           
  Lines       31115    31125   +10     
=======================================
+ Hits        30542    30552   +10     
  Misses        573      573           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The cross-process pickle test for pybamm-team#5444 inlined `tmp_path` directly into
a Python source string sent to a child interpreter. On Windows, the path
contains backslashes (e.g. `\Users`, `\test_...`) that get parsed as
escape sequences when the child compiles the source, corrupting the
file path.

Use `repr(str(pkl))` so the path is emitted as a properly escaped Python
string literal regardless of platform.
Copy link
Copy Markdown
Member

@MarcBerliner MarcBerliner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this @kamal20122012! Just a couple of minor points -- otherwise looks good to go

Comment on lines +730 to +731
)
assert "OK" in result.stdout
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert that the observable outputs from the save_code and load_code are the same?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7cd0fb5 — the test now captures the discharge-capacity observable in both processes and asserts the stdouts match.

Comment thread CHANGELOG.md Outdated

## Bug fixes

- Fixed cross-process pickle round-trip for `Solution`: accessing variables in `all_first_states` (or any not-yet-cached variable) on a `Solution` loaded in a different Python process previously failed with a `ModelError` from `Discretisation.y_slices`. `Symbol._id` is computed via `hash()` of strings, which is randomised per process; `Symbol.__setstate__` now refreshes `_id` so dicts keyed on Symbols are rebuilt with hashes consistent with the unpickling process. ([#5444](https://github.com/pybamm-team/PyBaMM/issues/5444))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please make this more high-level and point the url to the PR instead of the issue?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7cd0fb5 — trimmed to a single high-level sentence and repointed at the PR.

- Test now captures the observable from both the saving and loading
  processes and asserts they match, instead of only checking that the
  load process printed an "OK" marker.
- Changelog entry trimmed to a single high-level sentence and pointed
  at the PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Variables lost in first states when pickling

2 participants