[Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression (PLR) to speed up int type key search for LSM-tree (bourbon idea) #211
Open
liufuyang wants to merge 2 commits into
Conversation
liufuyang
commented
Jan 16, 2021
| itr.reset() | ||
|
|
||
| idx := itr.seekBlock(key) | ||
| idx := itr.seekBlockWithPlr(key) |
Author
There was a problem hiding this comment.
One can change this single line to itr.seekBlock(key) to turn off PLR (bourbon) feature
liufuyang
commented
Jan 16, 2021
| // k := y.KeyWithTs([]byte(fmt.Sprintf("%016x", i)), 0) | ||
| bs := make([]byte, 4) | ||
| binary.BigEndian.PutUint32(bs, uint32(i)) | ||
| k := y.KeyWithTs(bs, 0) |
Author
There was a problem hiding this comment.
This test runs slow and doesn't work on Pingcap/badger's master branch. So we didn't use this to evaluate in the end.
liufuyang
commented
Jan 16, 2021
| log.Println(filename) | ||
| log.Printf("plrSegment loaded: %v", data) | ||
| plr = &plrSegments{inner: data, FName: filename} | ||
| } |
Author
There was a problem hiding this comment.
This code above for loading the plr model from the .mod file
liufuyang
commented
Jan 16, 2021
| } | ||
| segment := s.inner[segmentIndex] | ||
| return segment.Slope*key + segment.Intercept, nil | ||
| } |
Author
There was a problem hiding this comment.
This code for model prediction
liufuyang
commented
Jan 16, 2021
| // only support key that can turn into uint | ||
| blockIndex := len(b.baseKeys.endOffs) - 1 | ||
| b.mw.Write([]byte(fmt.Sprintf("%d,%d\n", binary.BigEndian.Uint32(firstKey), blockIndex))) | ||
| } |
Author
There was a problem hiding this comment.
Here we dump the block's min key and block's index for training model plr
f279d02 to
6601369
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclaimer
Original implementation made by @spongedu at here https://github.com/spongedu/badger/tree/hackathon_go_plr_with_pointget_uint64key
I ported over and did some clean up work.
Idea from paper: From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees, by Yifan Dai
Rust code source:
plr(which used in badger code) code here https://github.com/liufuyang/plr-test/blob/master/src/main.rsplrlibrary code here https://github.com/RyanMarcus/plrBenchmark code here:
https://github.com/liufuyang/tbadger/blob/master/badger_test.go
With PLR, with pointGet
WithoutPLR, with pointGet
WithPLR, without pointGet
WithoutPLR, without pointGet
Summary:
Turn off badger index/pointGet to see the effect of
PLRvs bare binary search