pkg/profile: Use buffer pools to reduce allocs#1056
pkg/profile: Use buffer pools to reduce allocs#1056marselester wants to merge 5 commits intoparca-dev:mainfrom
Conversation
|
Could you also attach profiles if you have any? |
|
Not a seasoned Go programmer, so let me know what you think about replacing buf := make([]byte, 0, MAX_SHARD_SIZE)
unwindTable := bytes.NewBuffer(buf)
func updateTable() {
unwindTable.Reset() # reset slice, reusing the underlying memory buffer
# do work involving `unwindTable`
}The explicitness of this code is a bit easier to read I think and might make it potentially harder to introduce potential correctness issues. |
This always allocated a new byte slice. The idea behind the |
Yeah, exactly that! |
|
I can't find the constant that you're talking about, how much memory would we be talking about? |
|
Hi!, thank you for the feedback! go tool pprof dist/parca-agent allocs.pprofSo I checked go tool pprof -source_path=github.com/parca-dev/parca-agent/ dist/parca-agent allocs.pprofMost allocations happen in
For some reason pprof doesn't reuse For the start I was thinking to reuse parca-agent buffers even though that would be just a tiny improvement in allocs. go tool pprof -source_path=/home/vagrant/go/pkg/mod/ dist/parca-agent allocs.pprofgo tool pprof -source_path=/snap/go/current/src/ dist/parca-agent allocs.pprof |
It's not implemented yet, was suggesting going for this approach in the pseudocode above |
|
I couldn't find BTW, we can get rid of Details |
This code is gated under a feature flag, you would need
That being said seems worth taking a look 😄 |
c0a28d2 to
624871b
Compare
69f13c1 to
33f6195
Compare
|
Leaving the decision to merge @javierhonduco |
javierhonduco
left a comment
There was a problem hiding this comment.
Looks good, but could you run an end to end tests locally? The past PRs include some examples on how to test it under the "Test Plan" section https://github.com/parca-dev/parca-agent/pulls?q=is%3Apr+is%3Aclosed+eh_frame
f4d6277 to
0310530
Compare
|
I wrote a benchmark for Detailsfunc BenchmarkBPF(b *testing.B) {
m, err := bpf.NewModuleFromBufferArgs(bpf.NewModuleArgs{
BPFObjBuff: bpfObj,
BPFObjName: "parca",
})
require.NoError(b, err)
err = m.BPFLoadObject()
require.NoError(b, err)
b.Cleanup(m.Close)
bpfMaps, err := initializeMaps(m, byteorder.GetHostByteOrder())
require.NoError(b, err)
pid := 1
tb := unwind.NewUnwindTableBuilder(log.NewNopLogger())
pt, err := tb.UnwindTableForPid(pid)
require.NoError(b, err)
b.ResetTimer()
b.ReportAllocs()
for n := 0; n < b.N; n++ {
err = bpfMaps.setUnwindTable(pid, pt)
require.NoError(b, err)
}
} |
This is expected. We plan to reduce calls to this drastically. Maybe we should first wait for that PR to land. |
|
This is expected, as Kemal mentioned we are actively working on this area and will be publishing the changes over the next few weeks. We added a simple table generation benchmark to keep track of performance changes over time, as definitely there's a lot of low-hanging fruit. We mostly prioritised correctness over performance until now! 😄 That being said, I think your change to add the Just to give you a taste of the changes I am working on (will open a PR in the next few weeks), this is how things are looking now. There's still a lot of performance left on the table, though! |
|
Awesome, thank you for the explanation! |
0310530 to
4fb6b0b
Compare
4fb6b0b to
01f2497
Compare
|
@javierhonduco any thoughts on this? |
|
Hi @marselester! Sorry for the delay! This code has significantly changed now. I have checked some memory profiles and I think the cost has shifted a little bit. If it's ok we can maybe close this PR and if you are interested, I am more than happy in a few days to point you to other areas where we need to reduce allocations / CPU cycles 😄 |
|
@javierhonduco no worries :) I would appreciate the pointers for sure! |
|
@marselester sure thing, happy to get back to you next week :) |
No description provided.