Right now, it's common for any two concurrent update/delete transactions to have one return CommitConflict error. This makes it hard to make many updates. The best workaround for now is to do updates serially, but it would be nice if it wasn't required.
We have a retry loop for if transactions are compatible. If they need resolution, however, we just return an error. What if instead we can automatically resolve the conflicts?
Resolving conflicts
In the commit loop, transactions would just be a single transaction. But in other use cases, we might want to be able to consolidate all the changes into a single transaction.
This loop would need to load all the previous transactions since each pending transactions read_version. To make this faster, we should consider the caching described in #3057
A few examples of what this could do can be illustrative:
- Two concurrent deletes: each delete transaction will have their own deletion files, and potentially removed fragments. Should union set of removed fragment ids. Should write a new deletion file that is a union of the two.
- A concurrent append and delete: should transform in to a transaction that has both
- Two concurrent appends: Should handle modified fragments the same way as deletions. And then combine appends the same way.
- Add column and append: if new column is nullable, we will be able to append subschema.
- What if it is not nullable? Then it will fail?
- Overwrite and append: append can be ignored. In fact, any other transaction could be ignored.
TODO
Right now, it's common for any two concurrent update/delete transactions to have one return
CommitConflicterror. This makes it hard to make many updates. The best workaround for now is to do updates serially, but it would be nice if it wasn't required.We have a retry loop for if transactions are compatible. If they need resolution, however, we just return an error. What if instead we can automatically resolve the conflicts?
Resolving conflicts
In the commit loop,
transactionswould just be a single transaction. But in other use cases, we might want to be able to consolidate all the changes into a single transaction.This loop would need to load all the previous transactions since each pending transactions
read_version. To make this faster, we should consider the caching described in #3057A few examples of what this could do can be illustrative:
TODO