KAFKA-10847: Delete Time-ordered duplicated records using deleteRange() internally#10537
KAFKA-10847: Delete Time-ordered duplicated records using deleteRange() internally#10537guozhangwang merged 3 commits intoapache:trunkfrom spena:change_time_ordered_key_schema
Conversation
| } | ||
|
|
||
| @Override | ||
| public void remove(final Bytes key, final long timestamp) { |
There was a problem hiding this comment.
I wasn't sure how to name this method. I initially called removeRange(key, from, to), but I don't want to support a time range with a specific key because time-ordered key schema will delete other keys between from-key and to-key.
So I thought of just using one timestamp, to make sure this is not called with a time range. But removeRange(key, timestamp) does not look like a range. I ended up just calling it remove. Any thoughts?
There was a problem hiding this comment.
I think just calling remove is totally fine :)
guozhangwang
left a comment
There was a problem hiding this comment.
LGTM overall! Just minor comments.
| } | ||
|
|
||
| @Override | ||
| public void remove(final Bytes key, final long timestamp) { |
There was a problem hiding this comment.
I think just calling remove is totally fine :)
| * @param timestamp | ||
| * @return The key that represents the prefixed Segmented key in bytes. | ||
| */ | ||
| default Bytes toBinary(final Bytes key, long timestamp) { |
There was a problem hiding this comment.
nit: how about "toStoreBinaryKeyPrefix"?
| * A {@link RocksDBSegmentedBytesStore.KeySchema} to serialize/deserialize a RocksDB store | ||
| * key into a schema combined of (time,seq,key). This key schema is more efficient when doing | ||
| * range queries between a time interval. For key range queries better use {@link WindowKeySchema}. | ||
| * key into a schema combined of (time,key,seq). |
There was a problem hiding this comment.
nit: Add a note that since key is variable length while time/seq is fixed length, when formatting in this order varying time range query would be very inefficient since we'd need to be very conservative in picking the from / to boundaries; however for now we do not expect any varying time range access at all, only fixed time range only.
| throw new UnsupportedOperationException(); | ||
| } | ||
|
|
||
| public static Bytes toStoreKeyBinary(final Bytes key, |
There was a problem hiding this comment.
ditto: toStoreKeyBinaryPrefix.
|
Thanks @guozhangwang , I applied the changes. |
|
Thanks @spena , I will merge after green builds. |
|
Merged to trunk, thanks @spena ! |
This PR changes the
TimeOrderedKeySchemacomposite key fromtime-seq-key->time-key-seqto allow deletion of duplicated time-key records using the RocksDBdeleteRangeAPI. It also removes all duplicates whenput(key, null)is called. Currently, theput(key, null)was a no-op, which was causing problems because there was no way to delete any keys when duplicates are allowed.The RocksDB
deleteRange(keyFrom, keyTo)deletes a range of keys fromkeyFrom(inclusive) tokeyTo(exclusive). To makekeyToinclusive, I incremented the end key by one when calling theRocksDBAccessor.Committer Checklist (excluded from commit message)