fix: prevent duplicate manifest entries from concurrent table creation#6143
fix: prevent duplicate manifest entries from concurrent table creation#6143westonpace merged 3 commits intolance-format:mainfrom
Conversation
Change conflict_retries from 0 to 5 in insert_into_manifest so that
cross-process races are handled correctly. When two processes
concurrently insert the same object_id, the second one hits a commit
version conflict. With conflict_retries > 0, MergeInsert retries by
re-evaluating the full plan against the latest data, where the join
detects the existing row and WhenMatched::Fail fires properly.
Previously, conflict_retries=0 meant the second operation would fail
with a generic TooMuchWriteContention error, but in some cases both
commits could succeed creating duplicate manifest entries ("Expected
exactly 1 table...found 2").
Add test with two independent ManifestNamespace instances racing on the
same directory to verify no duplicates are created.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code ReviewOverall: Clean, well-motivated fix with a good test. The change from CI
Test note (minor)The test uses No other issues found. LGTM once the format check passes. Automated review by Claude Code |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
duck db related failures seem to have been fix on main. merged with latest |
| // When two processes concurrently insert the same object_id, the second one | ||
| // hits a commit conflict. With conflict_retries > 0, the retry re-evaluates | ||
| // the full MergeInsert plan against the latest data, where the join detects | ||
| // the existing row and WhenMatched::Fail fires, producing a clear error. |
There was a problem hiding this comment.
Why 5 and not 1? Even if we have 10 transactions running at the same time won't transactions 2-10 all hit the conflict error before they attempt to commit again (so 1 retry is enough)?
There was a problem hiding this comment.
I have no deep reason for 5.
FWIW, claude came up with a scenario related to multiple distinct tables being created and the fact that namespaces use lance tables that can conflict underneath when multiple different tables are created (not just the same table).
#6143) Change conflict_retries from 0 to 5 in insert_into_manifest so that cross-process races are handled correctly. When two processes concurrently insert the same object_id, the second one hits a commit version conflict. With conflict_retries > 0, MergeInsert retries by re-evaluating the full plan against the latest data, where the join detects the existing row and WhenMatched::Fail fires properly. Previously, conflict_retries=0 meant the second operation would fail with a generic TooMuchWriteContention error, but in some cases both commits could succeed creating duplicate manifest entries ("Expected exactly 1 table...found 2"). Add test with two independent ManifestNamespace instances racing on the same directory to verify no duplicates are created. Here's an example I run into occasionally ``` ... File "/home/runner/work/geneva/geneva/src/geneva/state/manager.py", line 35, in __init__ self.table = alter_or_create_table( File "/home/runner/work/geneva/geneva/src/geneva/utils/schema.py", line 138, in alter_or_create_table return db.create_table(table_name, schema=schema, namespace=namespace) File "/home/runner/work/geneva/geneva/src/geneva/db.py", line 403, in create_table return Table(self, name, namespace=namespace, storage_options=storage_options) File "/home/runner/work/geneva/geneva/src/geneva/table.py", line 489, in __init__ self._ltbl # noqa File "/home/runner/.local/share/uv/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/functools.py", line 981, in __get__ val = self.func(instance) File "/home/runner/work/geneva/geneva/src/geneva/table.py", line 543, in _ltbl tbl = inner.open_table(self.name, namespace=self._namespace) File "/home/runner/work/geneva/geneva/.venv/lib/python3.10/site-packages/lancedb/namespace.py", line 392, in open_table response = self._ns.describe_table(request) File "/home/runner/work/geneva/geneva/.venv/lib/python3.10/site-packages/lance/namespace.py", line 362, in describe_table response_dict = self._inner.describe_table(request.model_dump()) OSError: LanceError(IO): Expected exactly 1 table with id 'default$geneva_manifests', found 2, /home/runner/work/lance/lance/rust/lance-namespace-impls/src/dir/manifest.rs:642:21 ``` --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary Cherry-picks bug fixes onto `release/v3.0` for the v3.0.1 patch release: - **#6160** - fix: handle `DataType::Null` in `adjust_child_validity` to prevent panic - **#6187** - fix: handle nullable validity layers without def levels - **#6143** - fix: prevent duplicate manifest entries from concurrent table creation - **#6212** - chore: bump lz4_flex patch versions - **#6146** - fix: replace fetch_arrow_table with to_arrow_table ## Test plan - CI passes on cherry-picked commits (both PRs were already merged and tested on main) --------- Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Xuanwo <github@xuanwo.io> Co-authored-by: Jonathan Hsieh <jon@lancedb.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>
Change conflict_retries from 0 to 5 in insert_into_manifest so that cross-process races are handled correctly. When two processes concurrently insert the same object_id, the second one hits a commit version conflict. With conflict_retries > 0, MergeInsert retries by re-evaluating the full plan against the latest data, where the join detects the existing row and WhenMatched::Fail fires properly.
Previously, conflict_retries=0 meant the second operation would fail with a generic TooMuchWriteContention error, but in some cases both commits could succeed creating duplicate manifest entries ("Expected exactly 1 table...found 2").
Add test with two independent ManifestNamespace instances racing on the same directory to verify no duplicates are created.
Here's an example I run into occasionally