Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docs/src/format/table/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,18 +183,19 @@ See the `transaction.proto` file for its definition.
The commit process is as follows:

1. The writer finishes writing all data files.
2. The writer creates a transaction file in the `_transactions` directory.
This file describes the operations that were performed, which is used for two purposes:
(1) to detect conflicts, and (2) to re-build the manifest during retries.
2. Build a transaction struct based on the current version to tell how the manifest changes. Different operations have different structures for metadata changing.
3. Look for any new commits since the writer started writing.
If there are any, read their transaction files and check for conflicts.
If there are any conflicts, abort the commit. Otherwise, continue.
4. Build a manifest and attempt to commit it to the next version.
If there are any, begin the rebase process: read their transaction structures and check for conflicts.
If there are any conflicts and the conflicts are not retryable, abort the commit.
If the conflicts are retryable, attempt to resolve conflicts between the candidate transaction and the conflicting transaction. For example, if they delete disjoint sets of rows, then the deletion files can be merged. If conflicts can't be resolved, the retryable conflict error is bubbled up. This can be handled as a full retry of the write operation.
4. Create a transaction file in the _transactions directory which describes the operations that were performed for two purposes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to update this soon with #4774

(1) to detect conflicts, and (2) to re-build the manifest during retries.
5. Build a manifest and attempt to commit it to the next version.
If the commit fails because another writer has already committed, go back to step 3.

When checking whether two transactions conflict, be conservative.
If the transaction file is missing, assume it conflicts.
If the transaction file has an unknown operation, assume it conflicts.
If the transaction structure is missing, assume it conflicts.
If the transaction structure has an unknown operation, assume it conflicts.

### External Manifest Store

Expand Down
Binary file modified docs/src/images/conflict_resolution_flow.png
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could actually just make this a mermaid diagram. Then it's more understandable by AIs and can just be edited as text.

Here's how I would write it:

flowchart LR
    A[Write data files] --> C[Check for concurrent commits]
    A -.-> DF{{data/31a7060e-4898-4ecd-a428-afbff3539fa6.lance}}
    C --> D{Are there conflicts?}
    
    D -->|None| E[Write transaction file]
    E -.-> Txn{{_transactions/42-76019405-8d5a-43c3-a7a2-324ed49a9d75.txn}}
    D-->|Resolvable| G[Resolve conflicts] --> E
    G -.->|merged| Deletions{{_deletions/31a7060e-4898-4ecd-a428-afbff3539fa6.lance}}
    D -->|Retryable| H[Retry operation 🔄]
    D -->|Non-retryable| F[Abort ✗]
    
    E --> I[Atomically write manifest]
    I -.-> Manifest{{_versions/43.manifest}}
    I --> J{Success?}
    
    J -->|Yes| K[Complete ✓]
    J -->|No| C
    
    style A fill:#e1f5fe
    style K fill:#c8e6c9
    style F fill:#ffcdd2
    style H fill:#fff3e0
    style DF fill:#ddd
    style Txn fill:#ddd
    style Manifest fill:#ddd
    style Deletions fill: #ddd
Loading

This basically describes four outcomes of checking for conflicts:

  1. No conflicts
  2. Some conflicts, but we can resolve them just using the transactions
  3. Conflict, but it allows retries. So we can just retry the operation.
  4. Conflict, non-retryable. This rare, but happens for example if we are trying to append but another transaction completely changed the schema to remove columns we are trying to insert into.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry a bit late to this conversation, did not realize this PR. I just published #5209 which refreshes a lot of content for the table format and overall format intro. I also updated conflict resolution with mermaid diagram, but a bit different from this one. I will check tomorrow to see if I can reuse the content here.

Copy link
Copy Markdown
Contributor Author

@majin1102 majin1102 Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I guess this PR would be useless.

If the conflicts are retryable, attempt to resolve conflicts between the candidate transaction and the conflicting transaction. For example, if they delete disjoint sets of rows, then the deletion files can be merged. If conflicts can't be resolved, the retryable conflict error is bubbled up. This can be handled as a full retry of the write operation.

Thanks for correcting me @wjones127 . I was misunderstanding about retryable.

Is this good to close? @jackye1995

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.