Skip to content

Automatic conflict resolution #3068

@wjones127

Description

@wjones127

Right now, it's common for any two concurrent update/delete transactions to have one return CommitConflict error. This makes it hard to make many updates. The best workaround for now is to do updates serially, but it would be nice if it wasn't required.

We have a retry loop for if transactions are compatible. If they need resolution, however, we just return an error. What if instead we can automatically resolve the conflicts?

Resolving conflicts

In the commit loop, transactions would just be a single transaction. But in other use cases, we might want to be able to consolidate all the changes into a single transaction.

This loop would need to load all the previous transactions since each pending transactions read_version. To make this faster, we should consider the caching described in #3057

A few examples of what this could do can be illustrative:

  1. Two concurrent deletes: each delete transaction will have their own deletion files, and potentially removed fragments. Should union set of removed fragment ids. Should write a new deletion file that is a union of the two.
  2. A concurrent append and delete: should transform in to a transaction that has both
  3. Two concurrent appends: Should handle modified fragments the same way as deletions. And then combine appends the same way.
  4. Add column and append: if new column is nullable, we will be able to append subschema.
    1. What if it is not nullable? Then it will fail?
  5. Overwrite and append: append can be ignored. In fact, any other transaction could be ignored.

TODO

  • Auto conflict resolution for update

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions