Skip to content

Adding buffer pools for stream decompression#69

Merged
Viq111 merged 3 commits intoDataDog:1.xfrom
TriggerMail:1.x
Sep 25, 2019
Merged

Adding buffer pools for stream decompression#69
Viq111 merged 3 commits intoDataDog:1.xfrom
TriggerMail:1.x

Conversation

@dangermike
Copy link
Contributor

Digging into pprof, there was a lot of time spent in allocations and in GC. The allocations seemed to be mostly focused on NewReaderDict. In that method, two []byte slices were allocated. The size of those slices is based on compile-time decisions in the zstd C library. In an attempt to improve that situation, two changes were made:

  • The sizes of those buffers are read, casted, and checked once
  • The buffers come from a pair of sync.Pools in the NewReaderDict factory method, returned to the pool in reader.Close.

Using the following test:

$ PAYLOAD=zstd_decompress.c go test \
    -bench BenchmarkStreamDecompression \
    -benchtime=5s \
    -benchmem

The number of bytes allocated per decompression dropped to 1/705 of previous:

pkg: github.com/TriggerMail/zstd
BenchmarkStreamDecompression      100000        121617 ns/op     595.48 MB/s         384 B/op         10 allocs/op

pkg: github.com/DataDog/zstd
BenchmarkStreamDecompression       50000        156142 ns/op     463.81 MB/s      270640 B/op         10 allocs/op

Note that this improvement is only in the Go-managed heap memory, not additional heap allocations in creating the zstd context.

Digging into pprof, there was a lot of time spent in allocations and in GC. The allocations seemed to be mostly focused on `NewReaderDict`. In that method, two []byte slices were allocated. The size of those slices is based on compile-time decisions in the zstd C library. In an attempt to improve that situation, two changes were made:

* The sizes of those buffers are read, casted, and checked once
* The buffers come from a pair of `sync.Pool`s in the `NewReaderDict` factory method, returned to the pool in `reader.Close`.

Using the following test:
```
$ PAYLOAD=zstd_decompress.c go test \
    -bench BenchmarkStreamDecompression \
    -benchtime=5s \
    -benchmem
```

The number of bytes allocated per decompression dropped to 1/705 of previous:
```
pkg: github.com/TriggerMail/zstd
BenchmarkStreamDecompression      100000        121617 ns/op     595.48 MB/s         384 B/op         10 allocs/op

pkg: github.com/DataDog/zstd
BenchmarkStreamDecompression       50000        156142 ns/op     463.81 MB/s      270640 B/op         10 allocs/op
```

Note that this improvement is only in the Go-managed heap memory, not additional heap allocations in creating the zstd context. Because those are freed explicitly in `reader.Close`, the concern is heap fragmentation.
Copy link
Collaborator

@Viq111 Viq111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change!

Indeed I ran the benchmark locally as well (fyi, you can use benchstat to compare easily before/after results) and looks promising both for small and large payloads:

Small: zstd_decompress.c:

name                   old time/op   new time/op   delta
StreamDecompression-4    198µs ± 9%    120µs ± 3%  -39.76%  (p=0.000 n=10+10)

name                   old speed     new speed     delta
StreamDecompression-4  366MB/s ±10%  606MB/s ± 3%  +65.50%  (p=0.000 n=10+10)

Big (mr):

name                   old time/op   new time/op   delta
StreamDecompression-4    24.4ms ± 3%    23.0ms ± 2%  -5.77%  (p=0.000 n=9+9)

name                   old speed      new speed      delta
StreamDecompression-4   409MB/s ± 3%   434MB/s ± 2%  +6.11%  (p=0.000 n=9+9)

@Viq111
Copy link
Collaborator

Viq111 commented Sep 25, 2019

Actually since we now use a sync mechanism on the decompression path, do you mind added a parallel test to your PR ?

Something like:

func TestStreamCompressionDecompressionParallel(t *testing.T) {
	for i := 0; i < 1000; i++ {
		t.Run("", func(t2 *testing.T) {
			t2.Parallel()
			TestStreamCompressionDecompression(t2)
		})
	}
}

should be sufficient

@dangermike
Copy link
Contributor Author

Test added. Ran successfully with -race. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants