Skip to content

pkg/profiler: Use pool of gzip writers#1161

Merged
kakkoyun merged 2 commits intoparca-dev:mainfrom
marselester:improve-gzip-perf
Jan 12, 2023
Merged

pkg/profiler: Use pool of gzip writers#1161
kakkoyun merged 2 commits intoparca-dev:mainfrom
marselester:improve-gzip-perf

Conversation

@marselester
Copy link
Contributor

@marselester marselester commented Dec 21, 2022

klauspost's Stateless gzip compressor allowed to reduce CPU consumption by 20%, see #1065.

After profiling Parca Agent with Vtune I found an elevated CPI rate (124.4) in flate/stateless.go klauspost/compress#717. The author recommends to use a pool of gzip writers to get 2x-3x speed up.

new
(pprof) top20 -cum
Showing nodes accounting for 175.04MB, 36.80% of 475.70MB total
Dropped 155 nodes (cum <= 2.38MB)
Showing top 20 nodes out of 118
      flat  flat%   sum%        cum   cum%
         0     0%     0%   189.97MB 39.93%  github.com/godbus/dbus/v5.(*Conn).inWorker
   31.86MB  6.70%  6.70%   189.97MB 39.93%  github.com/godbus/dbus/v5.(*unixTransport).ReadMessage
         0     0%  6.70%   157.60MB 33.13%  github.com/godbus/dbus/v5.(*decoder).Decode
  111.01MB 23.34% 30.04%   157.60MB 33.13%  github.com/godbus/dbus/v5.(*decoder).decode
         0     0% 30.04%   157.10MB 33.03%  github.com/godbus/dbus/v5.decodeMessageBody
         0     0% 30.04%   133.76MB 28.12%  github.com/oklog/run.(*Group).Run.func1
         0     0% 30.04%   133.76MB 28.12%  github.com/parca-dev/parca-agent/pkg/profiler/cpu.(*CPU).Run
         0     0% 30.04%   133.76MB 28.12%  main.run.func6
         0     0% 30.04%   133.76MB 28.12%  main.run.func6.1
         0     0% 30.04%   133.76MB 28.12%  runtime/pprof.Do
    6.03MB  1.27% 31.30%   130.23MB 27.38%  github.com/coreos/go-systemd/v22/dbus.(*Conn).SubscribeUnitsCustom.func1
         0     0% 31.30%   124.20MB 26.11%  github.com/coreos/go-systemd/v22/dbus.(*Conn).ListUnits (inline)
         0     0% 31.30%   124.20MB 26.11%  github.com/coreos/go-systemd/v22/dbus.(*Conn).ListUnitsContext
   26.13MB  5.49% 36.80%   122.20MB 25.69%  github.com/coreos/go-systemd/v22/dbus.(*Conn).listUnitsInternal
         0     0% 36.80%    96.06MB 20.19%  github.com/godbus/dbus/v5.Store
         0     0% 36.80%    96.06MB 20.19%  github.com/godbus/dbus/v5.store
         0     0% 36.80%    96.06MB 20.19%  github.com/godbus/dbus/v5.storeInterfaces
         0     0% 36.80%    96.06MB 20.19%  github.com/godbus/dbus/v5.storeSlice
         0     0% 36.80%    65.06MB 13.68%  github.com/godbus/dbus/v5.(*Call).Store
         0     0% 36.80%    65.06MB 13.68%  github.com/godbus/dbus/v5.storeSliceIntoSlice

ROUTINE ======================== github.com/parca-dev/parca-agent/pkg/profiler.(*RemoteProfileWriter).Write in pkg/profiler/profile_writer.go
         0    13.52MB (flat, cum)  2.84% of Total
         .          .     78:// Write sends the profile using the designated write client.
         .          .     79:func (rw *RemoteProfileWriter) Write(ctx context.Context, labels model.LabelSet, prof *profile.Profile) error {
         .          .     80:	buf := bytes.NewBuffer(nil)
         .          .     81:	zw := rw.pool.Get().(*gzip.Writer)
         .          .     82:	zw.Reset(buf)
         .     7.81MB     83:	if err := prof.WriteUncompressed(zw); err != nil {
         .          .     84:		rw.pool.Put(zw)
         .          .     85:		return err
         .          .     86:	}
         .     5.71MB     87:	zw.Close()
         .          .     88:	rw.pool.Put(zw)
         .          .     89:
         .          .     90:	_, err := rw.profileStoreClient.WriteRaw(ctx, &profilestorepb.WriteRawRequest{
         .          .     91:		Normalized: true,
         .          .     92:		Series: []*profilestorepb.RawProfileSeries{{
old
(pprof) top20 -cum
Showing nodes accounting for 104.70MB, 40.74% of 257.02MB total
Dropped 109 nodes (cum <= 1.29MB)
Showing top 20 nodes out of 125
      flat  flat%   sum%        cum   cum%
         0     0%     0%   100.85MB 39.24%  github.com/oklog/run.(*Group).Run.func1
         0     0%     0%   100.85MB 39.24%  github.com/parca-dev/parca-agent/pkg/profiler/cpu.(*CPU).Run
         0     0%     0%   100.85MB 39.24%  main.run.func6
         0     0%     0%   100.85MB 39.24%  main.run.func6.1
         0     0%     0%   100.85MB 39.24%  runtime/pprof.Do
         0     0%     0%    73.71MB 28.68%  github.com/godbus/dbus/v5.(*Conn).inWorker
    8.58MB  3.34%  3.34%    73.71MB 28.68%  github.com/godbus/dbus/v5.(*unixTransport).ReadMessage
    9.11MB  3.55%  6.88%    64.13MB 24.95%  github.com/godbus/dbus/v5.DecodeMessage
         0     0%  6.88%    55.02MB 21.41%  github.com/godbus/dbus/v5.(*decoder).Decode
      50MB 19.46% 26.34%    55.02MB 21.41%  github.com/godbus/dbus/v5.(*decoder).decode
    3.02MB  1.17% 27.51%    54.02MB 21.02%  github.com/coreos/go-systemd/v22/dbus.(*Conn).SubscribeUnitsCustom.func1
         0     0% 27.51%       51MB 19.84%  github.com/coreos/go-systemd/v22/dbus.(*Conn).ListUnits (inline)
         0     0% 27.51%       51MB 19.84%  github.com/coreos/go-systemd/v22/dbus.(*Conn).ListUnitsContext
   12.49MB  4.86% 32.37%       50MB 19.45%  github.com/coreos/go-systemd/v22/dbus.(*Conn).listUnitsInternal
         0     0% 32.37%    44.89MB 17.47%  github.com/parca-dev/parca-agent/pkg/metadata.(*StatelessProvider).Labels
         0     0% 32.37%    44.89MB 17.47%  github.com/parca-dev/parca-agent/pkg/metadata/labels.(*Manager).LabelSet
         0     0% 32.37%    44.89MB 17.47%  github.com/parca-dev/parca-agent/pkg/metadata/labels.(*Manager).labelSet
         0     0% 32.37%    44.39MB 17.27%  github.com/parca-dev/parca-agent/pkg/metadata.Compiler.func1
         0     0% 32.37%    42.50MB 16.54%  debug/elf.Open
   21.50MB  8.37% 40.74%    41.50MB 16.15%  debug/elf.NewFile

ROUTINE ======================== github.com/parca-dev/parca-agent/pkg/profiler.(*RemoteProfileWriter).Write in pkg/profiler/profile_writer.go
         0        2MB (flat, cum)  0.78% of Total
         .          .     73:	buf := bytes.NewBuffer(nil)
         .          .     74:	zw, err := gzip.NewWriterLevel(buf, gzip.StatelessCompression)
         .          .     75:	if err != nil {
         .          .     76:		return err
         .          .     77:	}
         .        1MB     78:	if err = prof.WriteUncompressed(zw); err != nil {
         .          .     79:		zw.Close()
         .          .     80:		return err
         .          .     81:	}
         .          .     82:	zw.Close()
         .          .     83:
         .          .     84:	_, err = rw.profileStoreClient.WriteRaw(ctx, &profilestorepb.WriteRawRequest{
         .          .     85:		Normalized: true,
         .          .     86:		Series: []*profilestorepb.RawProfileSeries{{
         .        1MB     87:			Labels: &profilestorepb.LabelSet{Labels: convertLabels(labels)},
         .          .     88:			Samples: []*profilestorepb.RawSample{{
         .          .     89:				RawProfile: buf.Bytes(),
         .          .     90:			}},
         .          .     91:		}},
         .          .     92:	})

I am yet to check the CPI rate after introducing the pool.

@marselester marselester requested a review from a team as a code owner December 21, 2022 19:37
@marselester
Copy link
Contributor Author

The gzip compressor no longer shows up in the top of CPI rate.

Screenshot 2022-12-21 at 21 25 21

@zdyj3170101136
Copy link
Contributor

zdyj3170101136 commented Dec 22, 2022

may be send uncompressed data is ok?
the agent and the server is both running in same network.
hundreds of KB data would not cost too much network resource.

@brancz
Copy link
Member

brancz commented Dec 22, 2022

It's a trade-off. If it's only compiled processes I would agree, but we also resolve perf-maps and kernel symbols. For those types of workloads it is a good idea to use some compression. I'd be ok with having an option to turn off compression as a flag if people want that for their setup, I would still say the default should be to have compression on. We actually went with exactly the same thing in Prometheus.

Copy link
Contributor

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to merging. Let's not ignore errors. And we can merge this PR.

Thanks for this work.

return &RemoteProfileWriter{
profileStoreClient: profileStoreClient,
pool: sync.Pool{New: func() interface{} {
z, _ := gzip.NewWriterLevel(nil, gzip.BestSpeed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not ignore errors :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safe enough because gzip.NewWriter() does the same thing https://github.com/klauspost/compress/blob/master/gzip/gzip.go#L58. There will be no error since gzip.BestSpeed is a constant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, however, the underlying implementation can change, and I don't feel comfortable ignoring errors. We can change the API and return the error to the caller and let it handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

@marselester marselester Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sync.Pool.New accepts func() interface{}, so either an error or *gzip.Writer can be returned. Then that value returned from rw.pool.Get() can be checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error, then a pool would return nil and type casting will panic rw.pool.Get().(*gzip.Writer), so the error will be spotted immediately. Additionally, we can pass a logger to NewRemoteProfileWriter() and log an error so it will be easier to figure out why sync.Pool.New returned a nil.

@marselester marselester force-pushed the improve-gzip-perf branch 3 times, most recently from 46219dd to b10ec6e Compare January 5, 2023 17:53
klauspost's Stateless gzip compressor allowed to reduce
CPU consumption by 20%.
After profiling Parca Agent with Vtune I found
an elevated CPI rate (124.4) in flate/stateless.go.
The author recommends to use a pool of writers
to get 2x-3x speed up.
Copy link
Contributor

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kakkoyun kakkoyun merged commit 74ccf1e into parca-dev:main Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments