Skip to content

fix: concurrent read and write to directory namespace#5983

Merged
jackye1995 merged 11 commits intolance-format:mainfrom
jackye1995:dir-fixes
Feb 26, 2026
Merged

fix: concurrent read and write to directory namespace#5983
jackye1995 merged 11 commits intolance-format:mainfrom
jackye1995:dir-fixes

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Feb 23, 2026

Summary

  • Add Throttled error type (code 21) to lance-namespace for rate limiting and concurrent operation throttling
  • Add commit_retries config to DirectoryNamespace builder (default 20) to control inner commit retry count.
  • For MergeInserts run in DirectoryNamespace, make sure those set outer to 0.
  • Ensure object_id is set as the primary key in the __manifest table to ensure we enable primary-key-based dedupe.
  • Add commit_retries() method to MergeInsertBuilder for controlling inner manifest write retry
  • Properly map error types: CommitConflictThrottled (safe to retry), TooMuchWriteContentionConcurrentModification (semantic conflict)
  • Add comprehensive concurrent create/drop tests for Python, Java, and Rust with S3 backend

Notes:

  1. Also tested with lance-trino to make sure it solves the problems in Trino for concurrent reads and writes
  2. I added tests in both java and python to make sure the binding does not introduce additional issue for concurrent access.

🤖 Generated with Claude Code

@github-actions github-actions Bot added enhancement New feature or request python java labels Feb 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: feat(namespace): add Throttled error and configurable commit retries

Overall, this PR adds important concurrency handling for namespace operations. The test coverage is comprehensive. A few items to consider:

P1: Semantic concern with error mapping

The mapping of CommitConflictThrottled (lines in manifest.rs:831-841) may be semantically misleading:

if error_msg.contains("CommitConflict")
    || error_msg.contains("Failed to commit the transaction after")
{
    NamespaceError::Throttled { ... }
}

"Throttled" typically implies rate limiting by the service, while CommitConflict indicates version conflicts from concurrent writes. Clients might expect different retry strategies for each:

  • Throttled → exponential backoff respecting service limits
  • CommitConflict → immediate retry with fresh state

Consider using a more specific error variant like CommitRetryExhausted or documenting clearly that Throttled is used for all retriable contention scenarios.

Minor: Unnecessary clone in delete_from_manifest

self.manifest_dataset
    .set_latest(Arc::try_unwrap(new_dataset).unwrap_or_else(|arc| (*arc).clone()))
    .await;

Since new_dataset is an Arc<Dataset> returned from DeleteBuilder::execute(), try_unwrap will likely always fail (reference count > 1), resulting in an unnecessary clone. Consider just using Arc::unwrap_or_clone (Rust 1.76+) or accepting the Arc directly if the API allows.

Note: High retry counts in tests

Tests use commit_retries: "1000" which is fine for testing, but ensure the default of 20 is sufficient for production S3 workloads where latency can vary significantly.


The concurrent tests are well-structured and mirror the same patterns across Rust, Python, and Java. The debug logging additions will be helpful for troubleshooting production issues.

@jackye1995 jackye1995 marked this pull request as draft February 23, 2026 06:53
@jackye1995 jackye1995 changed the title feat(namespace): add Throttled error and configurable commit retries for concurrent operations fix: concurrent read and write to directory namespace Feb 24, 2026
@github-actions github-actions Bot added the bug Something isn't working label Feb 24, 2026
@jackye1995 jackye1995 marked this pull request as ready for review February 24, 2026 18:35


@pytest.mark.skipif(
sys.platform == "win32",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are in general some problems in windows path, that I will fix fully in a separated PR

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the errors might be mixed up.

I like the tests though.

Comment thread rust/lance-namespace-impls/src/dir/manifest.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

wjones127 added a commit that referenced this pull request Feb 24, 2026
From
#5983 (comment),
we currently use `CommitConflict` for two situations:

1. Incompatible transactions: there is a conflict that is not
retry-able. For example, you are trying to create an index, but a
concurrent transaction overwrote the table and changed the schema.
1. Commit step ran out of retries: we hit the max number of rebase
attempts, and even though we could retry again, we aren't. This is
indeed just throttling.

This makes them separate errors.
@jackye1995 jackye1995 requested a review from wjones127 February 25, 2026 00:45
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 25, 2026
From
lance-format#5983 (comment),
we currently use `CommitConflict` for two situations:

1. Incompatible transactions: there is a conflict that is not
retry-able. For example, you are trying to create an index, but a
concurrent transaction overwrote the table and changed the schema.
1. Commit step ran out of retries: we hit the max number of rebase
attempts, and even though we could retry again, we aren't. This is
indeed just throttling.

This makes them separate errors.
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 25, 2026
From
lance-format#5983 (comment),
we currently use `CommitConflict` for two situations:

1. Incompatible transactions: there is a conflict that is not
retry-able. For example, you are trying to create an index, but a
concurrent transaction overwrote the table and changed the schema.
1. Commit step ran out of retries: we hit the max number of rebase
attempts, and even though we could retry again, we aren't. This is
indeed just throttling.

This makes them separate errors.
@jackye1995 jackye1995 merged commit fb3bbf5 into lance-format:main Feb 26, 2026
27 of 28 checks passed
jackye1995 added a commit to wjones127/lance that referenced this pull request Feb 26, 2026
)

## Summary

- Add `Throttled` error type (code 21) to lance-namespace for rate
limiting and concurrent operation throttling
- Add `commit_retries` config to DirectoryNamespace builder (default 20)
to control inner commit retry count.
- For MergeInserts run in DirectoryNamespace, make sure those set outer
to 0.
- Ensure `object_id` is set as the primary key in the `__manifest` table
to ensure we enable primary-key-based dedupe.
- Add `commit_retries()` method to `MergeInsertBuilder` for controlling
inner manifest write retry
- Properly map error types: `CommitConflict` → `Throttled` (safe to
retry), `TooMuchWriteContention` → `ConcurrentModification` (semantic
conflict)
- Add comprehensive concurrent create/drop tests for Python, Java, and
Rust with S3 backend

Notes:
1. Also tested with lance-trino to make sure it solves the problems in
Trino for concurrent reads and writes
2. I added tests in both java and python to make sure the binding does
not introduce additional issue for concurrent access.
wjones127 added a commit that referenced this pull request Feb 26, 2026
From
#5983 (comment),
we currently use `CommitConflict` for two situations:

1. Incompatible transactions: there is a conflict that is not
retry-able. For example, you are trying to create an index, but a
concurrent transaction overwrote the table and changed the schema.
1. Commit step ran out of retries: we hit the max number of rebase
attempts, and even though we could retry again, we aren't. This is
indeed just throttling.

This makes them separate errors.
wjones127 pushed a commit that referenced this pull request Feb 26, 2026
## Summary

- Add `Throttled` error type (code 21) to lance-namespace for rate
limiting and concurrent operation throttling
- Add `commit_retries` config to DirectoryNamespace builder (default 20)
to control inner commit retry count.
- For MergeInserts run in DirectoryNamespace, make sure those set outer
to 0.
- Ensure `object_id` is set as the primary key in the `__manifest` table
to ensure we enable primary-key-based dedupe.
- Add `commit_retries()` method to `MergeInsertBuilder` for controlling
inner manifest write retry
- Properly map error types: `CommitConflict` → `Throttled` (safe to
retry), `TooMuchWriteContention` → `ConcurrentModification` (semantic
conflict)
- Add comprehensive concurrent create/drop tests for Python, Java, and
Rust with S3 backend

Notes:
1. Also tested with lance-trino to make sure it solves the problems in
Trino for concurrent reads and writes
2. I added tests in both java and python to make sure the binding does
not introduce additional issue for concurrent access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants