fix: concurrent read and write to directory namespace#5983
fix: concurrent read and write to directory namespace#5983jackye1995 merged 11 commits intolance-format:mainfrom
Conversation
PR Review: feat(namespace): add Throttled error and configurable commit retriesOverall, this PR adds important concurrency handling for namespace operations. The test coverage is comprehensive. A few items to consider: P1: Semantic concern with error mappingThe mapping of if error_msg.contains("CommitConflict")
|| error_msg.contains("Failed to commit the transaction after")
{
NamespaceError::Throttled { ... }
}"Throttled" typically implies rate limiting by the service, while
Consider using a more specific error variant like Minor: Unnecessary clone in
|
f2503d9 to
d283c7c
Compare
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| sys.platform == "win32", |
There was a problem hiding this comment.
there are in general some problems in windows path, that I will fix fully in a separated PR
d10804a to
267cd26
Compare
wjones127
left a comment
There was a problem hiding this comment.
I think some of the errors might be mixed up.
I like the tests though.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
From #5983 (comment), we currently use `CommitConflict` for two situations: 1. Incompatible transactions: there is a conflict that is not retry-able. For example, you are trying to create an index, but a concurrent transaction overwrote the table and changed the schema. 1. Commit step ran out of retries: we hit the max number of rebase attempts, and even though we could retry again, we aren't. This is indeed just throttling. This makes them separate errors.
3a024e6 to
3fbe8b0
Compare
3f5b99c to
f345b6c
Compare
From lance-format#5983 (comment), we currently use `CommitConflict` for two situations: 1. Incompatible transactions: there is a conflict that is not retry-able. For example, you are trying to create an index, but a concurrent transaction overwrote the table and changed the schema. 1. Commit step ran out of retries: we hit the max number of rebase attempts, and even though we could retry again, we aren't. This is indeed just throttling. This makes them separate errors.
From lance-format#5983 (comment), we currently use `CommitConflict` for two situations: 1. Incompatible transactions: there is a conflict that is not retry-able. For example, you are trying to create an index, but a concurrent transaction overwrote the table and changed the schema. 1. Commit step ran out of retries: we hit the max number of rebase attempts, and even though we could retry again, we aren't. This is indeed just throttling. This makes them separate errors.
f345b6c to
4ba126d
Compare
cadbbf4 to
be7cd3f
Compare
) ## Summary - Add `Throttled` error type (code 21) to lance-namespace for rate limiting and concurrent operation throttling - Add `commit_retries` config to DirectoryNamespace builder (default 20) to control inner commit retry count. - For MergeInserts run in DirectoryNamespace, make sure those set outer to 0. - Ensure `object_id` is set as the primary key in the `__manifest` table to ensure we enable primary-key-based dedupe. - Add `commit_retries()` method to `MergeInsertBuilder` for controlling inner manifest write retry - Properly map error types: `CommitConflict` → `Throttled` (safe to retry), `TooMuchWriteContention` → `ConcurrentModification` (semantic conflict) - Add comprehensive concurrent create/drop tests for Python, Java, and Rust with S3 backend Notes: 1. Also tested with lance-trino to make sure it solves the problems in Trino for concurrent reads and writes 2. I added tests in both java and python to make sure the binding does not introduce additional issue for concurrent access.
From #5983 (comment), we currently use `CommitConflict` for two situations: 1. Incompatible transactions: there is a conflict that is not retry-able. For example, you are trying to create an index, but a concurrent transaction overwrote the table and changed the schema. 1. Commit step ran out of retries: we hit the max number of rebase attempts, and even though we could retry again, we aren't. This is indeed just throttling. This makes them separate errors.
## Summary - Add `Throttled` error type (code 21) to lance-namespace for rate limiting and concurrent operation throttling - Add `commit_retries` config to DirectoryNamespace builder (default 20) to control inner commit retry count. - For MergeInserts run in DirectoryNamespace, make sure those set outer to 0. - Ensure `object_id` is set as the primary key in the `__manifest` table to ensure we enable primary-key-based dedupe. - Add `commit_retries()` method to `MergeInsertBuilder` for controlling inner manifest write retry - Properly map error types: `CommitConflict` → `Throttled` (safe to retry), `TooMuchWriteContention` → `ConcurrentModification` (semantic conflict) - Add comprehensive concurrent create/drop tests for Python, Java, and Rust with S3 backend Notes: 1. Also tested with lance-trino to make sure it solves the problems in Trino for concurrent reads and writes 2. I added tests in both java and python to make sure the binding does not introduce additional issue for concurrent access.
Summary
Throttlederror type (code 21) to lance-namespace for rate limiting and concurrent operation throttlingcommit_retriesconfig to DirectoryNamespace builder (default 20) to control inner commit retry count.object_idis set as the primary key in the__manifesttable to ensure we enable primary-key-based dedupe.commit_retries()method toMergeInsertBuilderfor controlling inner manifest write retryCommitConflict→Throttled(safe to retry),TooMuchWriteContention→ConcurrentModification(semantic conflict)Notes:
🤖 Generated with Claude Code