Adding buffer pools for stream decompression by dangermike · Pull Request #69 · DataDog/zstd

dangermike · 2019-09-25T15:39:17Z

Digging into pprof, there was a lot of time spent in allocations and in GC. The allocations seemed to be mostly focused on NewReaderDict. In that method, two []byte slices were allocated. The size of those slices is based on compile-time decisions in the zstd C library. In an attempt to improve that situation, two changes were made:

The sizes of those buffers are read, casted, and checked once
The buffers come from a pair of sync.Pools in the NewReaderDict factory method, returned to the pool in reader.Close.

Using the following test:

$ PAYLOAD=zstd_decompress.c go test \
    -bench BenchmarkStreamDecompression \
    -benchtime=5s \
    -benchmem

The number of bytes allocated per decompression dropped to 1/705 of previous:

pkg: github.com/TriggerMail/zstd
BenchmarkStreamDecompression      100000        121617 ns/op     595.48 MB/s         384 B/op         10 allocs/op

pkg: github.com/DataDog/zstd
BenchmarkStreamDecompression       50000        156142 ns/op     463.81 MB/s      270640 B/op         10 allocs/op

Note that this improvement is only in the Go-managed heap memory, not additional heap allocations in creating the zstd context.

Digging into pprof, there was a lot of time spent in allocations and in GC. The allocations seemed to be mostly focused on `NewReaderDict`. In that method, two []byte slices were allocated. The size of those slices is based on compile-time decisions in the zstd C library. In an attempt to improve that situation, two changes were made: * The sizes of those buffers are read, casted, and checked once * The buffers come from a pair of `sync.Pool`s in the `NewReaderDict` factory method, returned to the pool in `reader.Close`. Using the following test: ``` $ PAYLOAD=zstd_decompress.c go test \ -bench BenchmarkStreamDecompression \ -benchtime=5s \ -benchmem ``` The number of bytes allocated per decompression dropped to 1/705 of previous: ``` pkg: github.com/TriggerMail/zstd BenchmarkStreamDecompression 100000 121617 ns/op 595.48 MB/s 384 B/op 10 allocs/op pkg: github.com/DataDog/zstd BenchmarkStreamDecompression 50000 156142 ns/op 463.81 MB/s 270640 B/op 10 allocs/op ``` Note that this improvement is only in the Go-managed heap memory, not additional heap allocations in creating the zstd context. Because those are freed explicitly in `reader.Close`, the concern is heap fragmentation.

Viq111

Great change!

Indeed I ran the benchmark locally as well (fyi, you can use benchstat to compare easily before/after results) and looks promising both for small and large payloads:

Small: zstd_decompress.c:

name                   old time/op   new time/op   delta
StreamDecompression-4    198µs ± 9%    120µs ± 3%  -39.76%  (p=0.000 n=10+10)

name                   old speed     new speed     delta
StreamDecompression-4  366MB/s ±10%  606MB/s ± 3%  +65.50%  (p=0.000 n=10+10)

Big (mr):

name                   old time/op   new time/op   delta
StreamDecompression-4    24.4ms ± 3%    23.0ms ± 2%  -5.77%  (p=0.000 n=9+9)

name                   old speed      new speed      delta
StreamDecompression-4   409MB/s ± 3%   434MB/s ± 2%  +6.11%  (p=0.000 n=9+9)

Viq111 · 2019-09-25T16:44:03Z

Actually since we now use a sync mechanism on the decompression path, do you mind added a parallel test to your PR ?

Something like:

func TestStreamCompressionDecompressionParallel(t *testing.T) {
	for i := 0; i < 1000; i++ {
		t.Run("", func(t2 *testing.T) {
			t2.Parallel()
			TestStreamCompressionDecompression(t2)
		})
	}
}

should be sufficient

dangermike · 2019-09-25T18:02:28Z

Test added. Ran successfully with -race. Thanks!

Viq111 approved these changes Sep 25, 2019

View reviewed changes

Merge branch '1.x' into 1.x

2226489

test for parallelism

6195e89

Viq111 mentioned this pull request Sep 25, 2019

[CI] PR to test https://github.com/DataDog/zstd/pull/69 #70

Closed

Viq111 merged commit ebc928a into DataDog:1.x Sep 25, 2019

Viq111 mentioned this pull request Feb 4, 2020

[zstd_stream] Now use ZSTD_decompressStream C API #80

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding buffer pools for stream decompression#69

Adding buffer pools for stream decompression#69
Viq111 merged 3 commits intoDataDog:1.xfrom
TriggerMail:1.x

dangermike commented Sep 25, 2019

Uh oh!

Viq111 left a comment

Uh oh!

Viq111 commented Sep 25, 2019

Uh oh!

dangermike commented Sep 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dangermike commented Sep 25, 2019

Uh oh!

Viq111 left a comment

Choose a reason for hiding this comment

Uh oh!

Viq111 commented Sep 25, 2019

Uh oh!

dangermike commented Sep 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants