Skip to content

[Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression (PLR) to speed up int type key search for LSM-tree (bourbon idea) #211

Open
liufuyang wants to merge 2 commits into
pingcap:masterfrom
liufuyang:plt-based-on-master
Open

[Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression (PLR) to speed up int type key search for LSM-tree (bourbon idea) #211
liufuyang wants to merge 2 commits into
pingcap:masterfrom
liufuyang:plt-based-on-master

Conversation

@liufuyang
Copy link
Copy Markdown

@liufuyang liufuyang commented Jan 16, 2021

Disclaimer

Original implementation made by @spongedu at here https://github.com/spongedu/badger/tree/hackathon_go_plr_with_pointget_uint64key
I ported over and did some clean up work.

Idea from paper: From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees, by Yifan Dai

Rust code source:

Benchmark code here:

https://github.com/liufuyang/tbadger/blob/master/badger_test.go

With PLR, with pointGet

(fuyang-go-bench-badger *%) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:36:38 plr opening file: data/00000004.sst
2021/01/16 18:36:38 data/00000004.sst
2021/01/16 18:36:38 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.040000
-------> Hit rate: 0.042700
-------> Hit rate: 0.039912
-------> Hit rate: 0.039828
-------> Hit rate: 0.039901
40003507	       985 ns/op
PASS
ok  	tbadger	69.299s

WithoutPLR, with pointGet

(fuyang-go-bench-badger *%) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:40:05 plr opening file: data/00000004.sst
2021/01/16 18:40:05 data/00000004.sst
2021/01/16 18:40:05 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.060000
-------> Hit rate: 0.039500
-------> Hit rate: 0.039985
-------> Hit rate: 0.039894
-------> Hit rate: 0.039881
38633751	      1025 ns/op
PASS
ok  	tbadger	60.518s

WithPLR, without pointGet

(master %) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:52:33 plr opening file: data/00000004.sst
2021/01/16 18:52:33 data/00000004.sst
2021/01/16 18:52:33 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.030000
-------> Hit rate: 0.039600
-------> Hit rate: 0.039574
-------> Hit rate: 0.039863
-------> Hit rate: 0.039875
18224655	      1865 ns/op
PASS
ok  	tbadger	60.895s

WithoutPLR, without pointGet

(master %) 🦀 👉  go test -bench=BenchmarkBadger  -benchtime=30s -run=^s
2021/01/16 18:50:51 plr opening file: data/00000004.sst
2021/01/16 18:50:51 data/00000004.sst
2021/01/16 18:50:51 plrSegment loaded: [{0 43371 0.002763512135689483 1.301027126230357} {43371 173143 0.001957717574102289 32.139598333991884} {173143 436947 0.0016452183457147617 82.19281535952956} {436947 833712 0.0014415577306141915 167.1788822889456} {833712 1.393807e+06 0.0013017392682151607 279.7244833388081} {1.393807e+06 2.144502e+06 0.0012110451657496644 402.0680961828125} {2.144502e+06 3.243884e+06 0.001120694858783678 591.6862955068373} {3.243884e+06 4.252725e+06 0.001064653793312162 769.4048384938992} {4.252725e+06 5.489191e+06 0.0010158619299836292 972.7984525022184} {5.489191e+06 7.260306e+06 0.0009696070011254119 1222.5316082781937} {7.260306e+06 8.683148e+06 0.000932753999857703 1485.8753435962026} {8.683148e+06 1.0541136e+07 0.0009026047455994166 1743.6066808302094} {1.0541136e+07 1.2621543e+07 0.0008782587667118885 1996.1946893039676} {1.2621543e+07 1.4617779e+07 0.0008541220294640251 2296.6850507190775} {1.4617779e+07 1.7976931348623157e+308 0.0008369829862892688 2543.212360918902}]
-------> Hit rate: 0.000000
goos: darwin
goarch: amd64
pkg: tbadger
BenchmarkBadger-12    	-------> Hit rate: 0.030000
-------> Hit rate: 0.037500
-------> Hit rate: 0.039512
-------> Hit rate: 0.039923
14039182	      2240 ns/op
PASS
ok  	tbadger	34.574s

Summary:

  • With PLR, with pointGet 985 ns/op
  • W/ot PLR, with pointGet 1025 ns/op (Baseline)

Turn off badger index/pointGet to see the effect of PLR vs bare binary search

  • With PLR, w/ot pointGet 1865 ns/op
  • W/ot PLR, w/ot pointGet 2240 ns/op

Comment thread table/sstable/iterator.go
itr.reset()

idx := itr.seekBlock(key)
idx := itr.seekBlockWithPlr(key)
Copy link
Copy Markdown
Author

@liufuyang liufuyang Jan 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can change this single line to itr.seekBlock(key) to turn off PLR (bourbon) feature

@liufuyang liufuyang changed the title [Final version] Plt (bourbon) change based on master [Hackathon2020 - Do not merge] Plt (bourbon) change based on master Jan 16, 2021
// k := y.KeyWithTs([]byte(fmt.Sprintf("%016x", i)), 0)
bs := make([]byte, 4)
binary.BigEndian.PutUint32(bs, uint32(i))
k := y.KeyWithTs(bs, 0)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test runs slow and doesn't work on Pingcap/badger's master branch. So we didn't use this to evaluate in the end.

Comment thread table/sstable/table.go
log.Println(filename)
log.Printf("plrSegment loaded: %v", data)
plr = &plrSegments{inner: data, FName: filename}
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code above for loading the plr model from the .mod file

Comment thread table/sstable/table.go
}
segment := s.inner[segmentIndex]
return segment.Slope*key + segment.Intercept, nil
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code for model prediction

Comment thread table/sstable/builder.go
// only support key that can turn into uint
blockIndex := len(b.baseKeys.endOffs) - 1
b.mw.Write([]byte(fmt.Sprintf("%d,%d\n", binary.BigEndian.Uint32(firstKey), blockIndex)))
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we dump the block's min key and block's index for training model plr

@liufuyang liufuyang force-pushed the plt-based-on-master branch from f279d02 to 6601369 Compare January 16, 2021 18:05
@liufuyang liufuyang changed the title [Hackathon2020 - Do not merge] Plt (bourbon) change based on master [Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression to speed up int type key search for LSM-tree (bourbon idea) Jan 16, 2021
@liufuyang liufuyang changed the title [Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression to speed up int type key search for LSM-tree (bourbon idea) [Hackathon2020 - Do not merge] Greedy-piecewise-linear-regression (PLR) to speed up int type key search for LSM-tree (bourbon idea) Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant