Fix cross-process Solution unpickle by refreshing Symbol._id (#5444)#5480
Fix cross-process Solution unpickle by refreshing Symbol._id (#5444)#5480kamal20122012 wants to merge 6 commits intopybamm-team:mainfrom
Conversation
`Symbol._id` is computed via `set_id()` using Python's `hash(...)` over a tuple that includes string fields like `self.name`. Python's hash of strings is randomised per process via PYTHONHASHSEED, so a `_id` cached in one process is not equal to the value the same content produces in another process. When a `Solution` is pickled and reloaded in a fresh Python process, dicts keyed on `Symbol`s — most notably `Discretisation.y_slices` — end up indexed by stale hashes from the pickling process. Lookups for any variable that is not already cached in `model._variables_processed` (e.g. accessing a variable from `all_first_states` for the first time after unpickle) raise a `ModelError` from `Discretisation._process_symbol`. Refresh `_id` from `Symbol.__setstate__` so unpickled Symbols always carry a hash computed in the current process. Pickle calls `__setstate__` on graph nodes depth-first (children before parents) and fully restores key/value objects before the containing dict is filled, so the rebuilt dicts end up keyed by current-process hashes that match fresh lookups. Adds: * a unit test that pickles a Variable with a stuffed stale `_id` and asserts `__setstate__` recomputes it (single-process simulation of the cross-process bug); * an integration test that runs the save and load steps in separate Python subprocesses and asserts that `all_first_states[0][...]` returns data without error. Fixes pybamm-team#5444
- Hoist `pickle`, `subprocess`, and `sys` imports to module top so the tests do not trigger Pylint C0415 (import-outside-toplevel). - Use `monkeypatch.setattr(var, "_id", 12345)` instead of direct assignment, and `hash(loaded)` instead of reading `loaded._id`, so the test no longer triggers W0212 (protected-access). Behaviour is unchanged: the test still simulates a stale, cross-process `_id` and asserts that `Symbol.__setstate__` recomputes it.
Codacy runs Bandit, which flagged the pickle and subprocess usages in the pybamm-team#5444 regression tests: - B403/B301 on `import pickle` and `pickle.loads(pickled)` — the pickled bytes come from a `pickle.dumps` call two lines earlier in the same test, so deserialisation is trusted. - B404 on `import subprocess`, B603 on the two `subprocess.run` calls — the executable is `sys.executable` and the arg is a literal Python source string built in-process; there is no untrusted input. Add narrowly-scoped `# nosec` comments with codes and rationale so Codacy stops flagging these without globally suppressing the rules.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5480 +/- ##
=======================================
Coverage 98.15% 98.15%
=======================================
Files 338 338
Lines 31115 31125 +10
=======================================
+ Hits 30542 30552 +10
Misses 573 573 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The cross-process pickle test for pybamm-team#5444 inlined `tmp_path` directly into a Python source string sent to a child interpreter. On Windows, the path contains backslashes (e.g. `\Users`, `\test_...`) that get parsed as escape sequences when the child compiles the source, corrupting the file path. Use `repr(str(pkl))` so the path is emitted as a properly escaped Python string literal regardless of platform.
MarcBerliner
left a comment
There was a problem hiding this comment.
Thanks for handling this @kamal20122012! Just a couple of minor points -- otherwise looks good to go
| ) | ||
| assert "OK" in result.stdout |
There was a problem hiding this comment.
Can we assert that the observable outputs from the save_code and load_code are the same?
There was a problem hiding this comment.
Done in 7cd0fb5 — the test now captures the discharge-capacity observable in both processes and asserts the stdouts match.
|
|
||
| ## Bug fixes | ||
|
|
||
| - Fixed cross-process pickle round-trip for `Solution`: accessing variables in `all_first_states` (or any not-yet-cached variable) on a `Solution` loaded in a different Python process previously failed with a `ModelError` from `Discretisation.y_slices`. `Symbol._id` is computed via `hash()` of strings, which is randomised per process; `Symbol.__setstate__` now refreshes `_id` so dicts keyed on Symbols are rebuilt with hashes consistent with the unpickling process. ([#5444](https://github.com/pybamm-team/PyBaMM/issues/5444)) |
There was a problem hiding this comment.
Can you please make this more high-level and point the url to the PR instead of the issue?
There was a problem hiding this comment.
Done in 7cd0fb5 — trimmed to a single high-level sentence and repointed at the PR.
- Test now captures the observable from both the saving and loading processes and asserts they match, instead of only checking that the load process printed an "OK" marker. - Changelog entry trimmed to a single high-level sentence and pointed at the PR.
…t-states-5444 # Conflicts: # CHANGELOG.md
Summary
Symbol._idis computed inset_id()viahash(...)of a tuple containing strings (self.name, etc.). Python'shash()of strings is randomised per process (PYTHONHASHSEED), so_idvalues cached in the pickling process are not equal to the values the same content produces in the unpickling process.Symbols — primarilyDiscretisation.y_slices, but also various processed-symbol caches — are rebuilt at unpickle time using the keys' (stale)_id. Any lookup that has to go throughDiscretisation._process_symbolfor a fresh symbol therefore misses, and the user sees aModelError: No key set for variable '…'.model._variables_processedis populated on first access and does survive pickle, which is why touching a variable before pickling masks the bug, and why reloading in the same process never hits it.Symbol.__setstate__now refreshes_idafter restoring state. Pickle calls__setstate__depth-first on the object graph and finishes restoring dict keys/values before populating the dict, so any rebuilt container ends up keyed by current-process hashes that match fresh lookups.Fixes #5444
Test plan
new_sol.all_first_states[0][\"Discharge capacity [A.h]\"]returns data without error.tests/unit/test_expression_tree/test_symbol.py::TestSymbol::test_setstate_refreshes_id— pickles a Variable with a stuffed stale_idand asserts__setstate__recomputes it. Verified to fail on pre-fix code.tests/unit/test_solvers/test_solution.py::TestSolution::test_pickle_first_states_across_processes— runs save and load in separate Python subprocesses and asserts the variable access succeeds. Verified to fail on pre-fix code (the load subprocess raisesModelErrorand exits 1).pytest tests/unit/test_solvers/ tests/unit/test_expression_tree/— same baseline (601 passed, 21 pre-existing failures all from missing optionalskfem/snapshotdeps; identical numbers with and without the patch).pre-commit run --files <changed files>— all hooks pass.