Skip to content

fix: use lmdb_get to access already loaded data without re-reading#243

Open
Robert-Forrest wants to merge 2 commits intomicrosoft:mainfrom
materialsnexus:fix/LMDB_read_errors
Open

fix: use lmdb_get to access already loaded data without re-reading#243
Robert-Forrest wants to merge 2 commits intomicrosoft:mainfrom
materialsnexus:fix/LMDB_read_errors

Conversation

@Robert-Forrest
Copy link
Copy Markdown

This small PR is a fix for the following kind of error:

File "~/mattergen/.venv/bin/mattergen-evaluate", line 10, in <module>
    sys.exit(_main())
File "~/mattergen/mattergen/scripts/evaluate.py", line 60, in _main
    fire.Fire(main)
File "~/mattergen/.venv/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "~/mattergen/.venv/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
File "~/mattergen/.venv/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
File "~/mattergen/mattergen/scripts/evaluate.py", line 43, in main
    reference = LMDBGZSerializer().deserialize(reference_dataset_path)
File "~/mattergen/mattergen/evaluation/reference/reference_dataset_serializer.py", line 107, in deserialize
    impl=LMDBBackedReferenceDatasetImpl(lmdb_path, cleanup_dir=True),
File "~/mattergen/mattergen/evaluation/reference/reference_dataset_serializer.py", line 139, in __init__
    self._build_num_entries_by_chemsys_reduced_formulas(lmdb_path)
File "~/mattergen/mattergen/evaluation/reference/reference_dataset_serializer.py", line 150, in _build_num_entries_by_chemsys_reduced_formulas
    chemical_systems = lmdb_read_metadata(lmdb_path, "chemical_systems")
File "~/mattergen/mattergen/evaluation/utils/lmdb_utils.py", line 41, in lmdb_read_metadata
    with lmdb_open(db_path, readonly=True) as db:
File "~/mattergen/mattergen/evaluation/utils/lmdb_utils.py", line 21, in lmdb_open
    return lmdb.open(
lmdb.Error: The environment '****' is already open in this process.

which started occurring following the release of this change to LMDB: jnwatson/py-lmdb@2b26c9f, making opening the same environment more than once an error.

An alternative fix to this issue would be to pin the lmdb version used by mattergen to be <2.0.0, but then we remain vulnerable to the kinds of issues the change to lmdb aims to avoid (e.g. potential segfaults).

As far as I understand, Mattergen evaluation may be blocked in many environments currently, due to installation potentially picking up the latest lmdb version.

@Robert-Forrest
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree [company="{your company}"]

@microsoft-github-policy-service agree company="Materials Nexus"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant