diff --git a/docs/src/format/table/index/system/frag_reuse.md b/docs/src/format/table/index/system/frag_reuse.md index f0ba6acab2f..a508b52b920 100644 --- a/docs/src/format/table/index/system/frag_reuse.md +++ b/docs/src/format/table/index/system/frag_reuse.md @@ -31,5 +31,27 @@ The index accumulates a new **reuse version** every time a compaction is execute As long as all the scalar and vector indices are created after the specific reuse version, the indices are all caught up and the specific reuse version can be trimmed. -It is expected that the user schedules an additional process to trim the index periodically -to keep the list of reuse versions in control. \ No newline at end of file +## Impacts + +### Conflict Resolution + +The presence of the Fragment Reuse Index changes how Lance detects conflicts between concurrent +operations. Operations that would normally conflict with compaction (such as index building) can +proceed without conflict when the FRI is in use. For full details on how conflict detection is +affected, see [conflict resolution](../../transaction.md#conflict-resolution). + +### Index Load Cost + +When the FRI is present, indices must be remapped at load time. Each time an index is loaded into +the cache, the FRI is applied to translate old row addresses to current ones. This adds a small +cost to index loading but does not affect query performance once the index is cached. + +### FRI Growth and Cleanup + +The FRI grows with each compaction. Every compaction that defers index remapping adds a new reuse +version to the index. Over time, this can accumulate and increase the cost of index loading since +more address translations must be applied. + +Once all scalar and vector indices have been rebuilt past a given reuse version, that version is no +longer needed and can be trimmed. Users should schedule a periodic process to trim stale reuse +versions and keep the FRI size under control. \ No newline at end of file diff --git a/docs/src/guide/performance.md b/docs/src/guide/performance.md index 11ca2e23b72..f75eae77d8f 100644 --- a/docs/src/guide/performance.md +++ b/docs/src/guide/performance.md @@ -169,6 +169,47 @@ The default is 128 which is more than enough for most workloads. You can increas with a high-throughput workload. You can even disable this limit entirely by setting it to zero. Note that this can often lead to issues with excessive retries and timeouts from the object store. +## Conflict Handling + +Lance supports concurrent operations on the same table using optimistic concurrency control. When two +operations conflict, one of them must be retried. Retries are handled automatically but they repeat +work that has already been done, which can hurt throughput. Understanding and minimizing conflicts is +important for maintaining good performance in write-heavy workloads. + +Common sources of conflicts include: + +- Concurrent compaction and index building, since both need to modify the same indices +- Update operations that affect the same fragments, since both need to rewrite the same data files + +For more details on which operations conflict with each other, see +[conflict resolution](../format/table/transaction.md#conflict-resolution). + +### Fragment Reuse Index + +Compaction is one of the most expensive write operations because it rewrites data files and, by +default, remaps all indices to reflect the new row addresses. When compaction and index building +run concurrently, they often conflict because both need to modify the same indices. This typically +causes the compaction to fail and retry, and repeated failures can cause table layout to degrade +over time. + +The Fragment Reuse Index (FRI) solves this by allowing compaction to skip the index remap step. +Instead of immediately updating indices, compaction records a mapping from old fragment row +addresses to new ones. When indices are loaded into the cache, the FRI is applied to translate +the old row addresses to the current ones. This adds a small cost to index load time but does +not affect query performance once the index is cached. + +This decoupling means compaction and index building no longer conflict, which is especially +valuable for tables that are continuously ingesting data while also maintaining indices. + +To enable the FRI, set `defer_index_remap=True` when compacting: + +```python +dataset.optimize.compact_files(defer_index_remap=True) +``` + +For details on the index format and usage patterns, see the +[Fragment Reuse Index specification](../format/table/index/system/frag_reuse.md). + ## Indexes Training and searching indexes can have unique requirements for compute and memory. This section provides some