Skip to content

MINOR: Reduce allocations in requests via buffer caching#9229

Merged
ijuma merged 15 commits intoapache:trunkfrom
ijuma:reduce-produce-allocations-lz4
May 30, 2021
Merged

MINOR: Reduce allocations in requests via buffer caching#9229
ijuma merged 15 commits intoapache:trunkfrom
ijuma:reduce-produce-allocations-lz4

Conversation

@ijuma
Copy link
Copy Markdown
Member

@ijuma ijuma commented Aug 29, 2020

Use a caching BufferSupplier per request handler thread so that
decompression buffers are cached if supported by the underlying
CompressionType. This achieves a similar outcome as #9220, but
with less contention.

We introduce a RequestLocal class to make it easier to introduce
new request scoped stateful instances (one example we discussed
previously was an ActionQueue that could be used to avoid
some of the complex group coordinator locking).

This is a small win for zstd (no synchronization or soft references) and
a more significant win for lz4. In particular, it reduces allocations
significantly when the number of partitions is high. The decompression
buffer size is typically 64 KB, so a produce request with 1000 partitions
results in 64 MB of allocations even if each produce batch is small (likely,
when there are so many partitions).

I did a quick producer perf local test with 5000 partitions, 1 KB record
size,
1 broker, lz4 and ~0.5 for the producer compression rate metric:

Before this change:

20000000 records sent, 346314.349535 records/sec (330.27 MB/sec),
148.33 ms avg latency, 2267.00 ms max latency, 115 ms 50th, 383 ms 95th, 777 ms 99th, 1514 ms 99.9th.

After this change:

20000000 records sent, 431956.113259 records/sec (411.95 MB/sec),
117.79 ms avg latency, 1219.00 ms max latency, 99 ms 50th, 295 ms 95th, 440 ms 99th, 662 ms 99.9th.

That's a 25% throughput improvement and p999 latency was reduced to
under half (in this test).

Default arguments will be removed in a subsequent PR.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Use a caching BufferSupplier per request handler thread so that
decompression buffers are cached if supported by the underlying
CompressionType. This reduces allocations significantly for LZ4 when the
number of partitions is high. The decompression buffer size is typically
64 KB, so a produce request with 1000 partitions results in 64 MB of
allocations even if each produce batch is small (likely, when there are
so many partitions).

I did a quick producer perf local test with 5000 partitions, 1 KB record
size,
1 broker, lz4 and ~0.5 for the producer compression rate metric:

Before this change:
> 20000000 records sent, 346314.349535 records/sec (330.27 MB/sec),
148.33 ms avg latency, 2267.00 ms max latency, 115 ms 50th, 383 ms 95th,
777 ms 99th, 1514 ms 99.9th.

After this change:
> 20000000 records sent, 431956.113259 records/sec (411.95 MB/sec),
117.79 ms avg latency, 1219.00 ms max latency, 99 ms 50th, 295 ms 95th,
440 ms 99th, 662 ms 99.9th.

That's a 25% throughput improvement and p999 latency was reduced to
under half (in this test).
@chia7712
Copy link
Copy Markdown
Member

This patch makes each request (handler) thread have a BufferSupplier to simplify concurrency handling (by contrast, #9220 offers a thread-safe BufferSupplier).

This idea is good to me :)

@chia7712
Copy link
Copy Markdown
Member

Could we use ThreadLocal to keep those thread resources, like BufferSupplier and ActionQueue, to simplify the method arguments? The cost of ThreadLocal is low and it is easy to add new thread local resource in the future (and we don’t need to changes a lot of method arguments)

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Aug 29, 2020

In my opinion, thread locals are most useful when one doesn't control the code. For cases like this, being explicit makes it easier to reason about and also test. Even if it's a bit more work initially.

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Aug 29, 2020

@chia7712 One option would be for me to introduce a RequestContext case class and add the BufferSupplier as one of the fields. It would be easy to extend this class with request bound elements like ActionQueue. Thoughts?

@chia7712
Copy link
Copy Markdown
Member

One option would be for me to introduce a RequestContext case class and add the BufferSupplier as one of the fields. It would be easy to extend this class with request bound elements like ActionQueue. Thoughts?

It is great! +1

@chia7712
Copy link
Copy Markdown
Member

@ijuma I have closed #9220 and assign https://issues.apache.org/jira/browse/KAFKA-10433 to you.

@chia7712
Copy link
Copy Markdown
Member

@ijuma I had filed a PR to zstd-jni to open the door for reusing byte array of zstd compression (luben/zstd-jni@1346fc1). Also, there is a ticket (https://issues.apache.org/jira/browse/KAFKA-10470) which will apply the new API of zstd-jni.

ijuma added 3 commits January 26, 2021 07:36
…e-allocations-lz4

* apache-github/trunk: (562 commits)
  MINOR: remove unused code from MessageTest (apache#9961)
  MINOR: Fix visibility of Log.{unflushedMessages, addSegment} methods (apache#9966)
  KAFKA-12229: Restore original class loader in integration tests using EmbeddedConnectCluster during shutdown  (apache#9942)
  KAFKA-12190: Fix setting of file permissions on non-POSIX filesystems (apache#9947)
  MINOR: Remove `toStruct` and `fromStruct` methods from generated protocol classes (apache#9960)
  MINOR: Fix typo in Utils#toPositive (apache#9943)
  MINOR: MessageUtil: remove some deadcode (apache#9931)
  MINOR: Update zstd-jni to 1.4.8-2 (apache#9957)
  MINOR: Revert assertion in MockProducerTest (apache#9956)
  MINOR: Optimize assertions in unit tests (apache#9955)
  MINOR: Tag `RaftEventSimulationTest` as `integration` and tweak it (apache#9925)
  MINOR: Update to Gradle 6.8.1 (apache#9953)
  MINOR: A few small group coordinator cleanups (apache#9952)
  MINOR: Upgrade ducktape to version 0.8.1  (apache#9933)
  MINOR: fix record time in test shouldWipeOutStandbyStateDirectoryIfCheckpointIsMissing (apache#9948)
  MINOR: Restore interrupt status when closing (apache#9863)
  KAFKA-10357: Extract setup of repartition topics from Streams partition assignor (apache#9848)
  KAFKA-12212; Bump Metadata API version to remove `ClusterAuthorizedOperations` fields (KIP-700) (apache#9945)
  MINOR: log 2min processing summary of StreamThread loop (apache#9941)
  MINOR: Drop enable.metadata.quorum config (apache#9934)
  ...
@ijuma ijuma marked this pull request as ready for review January 26, 2021 16:25
@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Jan 26, 2021

@chia7712 One question we have to decide is whether we want to remove the default arguments in this PR or in a separate PR that is purely mechanical (no behavior changes). A lot of tests call the relevant methods, so removing the defaults would cause a lot of test changes. I am leaning towards doing it as a separate PR and maybe after the 2.8 branch is cut (to avoid disrupting other work targeting 2.8). What do you think?

@chia7712
Copy link
Copy Markdown
Member

I am leaning towards doing it as a separate PR and maybe after the 2.8 branch is cut (to avoid disrupting other work targeting 2.8).

+1

@chia7712
Copy link
Copy Markdown
Member

One option would be for me to introduce a RequestContext case class and add the BufferSupplier

@ijuma Do you lean toward to implement this? It can help us improve "action queue".

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Jan 26, 2021

@chia7712 I had forgotten that part of the discussion. :) Let me take a closer look at that.

…e-allocations-lz4

* apache-github/trunk: (118 commits)
  KAFKA-12327: Remove MethodHandle usage in CompressionType (apache#10123)
KAFKA-12297: Make MockProducer return RecordMetadata with values as
per contract
  MINOR: Update zstd and use classes with no finalizers (apache#10120)
KAFKA-12326: Corrected regresion in MirrorMaker 2 executable
introduced with KAFKA-10021 (apache#10122)
KAFKA-12321 the comparison function for uuid type should be 'equals'
rather than '==' (apache#10098)
  MINOR: Add FetchSnapshot API doc in KafkaRaftClient (apache#10097)
  MINOR: KIP-631 KafkaConfig fixes and improvements (apache#10114)
  KAFKA-12272: Fix commit-interval metrics (apache#10102)
  MINOR: Improve confusing admin client shutdown logging (apache#10107)
  MINOR: Add BrokerMetadataListener (apache#10111)
  MINOR: Support Raft-based metadata quorums in system tests (apache#10093)
MINOR: add the MetaLogListener, LocalLogManager, and Controller
interface. (apache#10106)
  MINOR: Introduce the KIP-500 Broker lifecycle manager (apache#10095)
MINOR: Remove always-passing validation in
TestRecordTest#testProducerRecord (apache#9930)
KAFKA-5235: GetOffsetShell: Support for multiple topics and consumer
configuration override (KIP-635) (apache#9430)
MINOR: Prevent creating partition.metadata until ID can be written
(apache#10041)
  MINOR: Add RaftReplicaManager (apache#10069)
MINOR: Add ClientQuotaMetadataManager for processing QuotaRecord
(apache#10101)
  MINOR: Rename DecommissionBrokers to UnregisterBrokers (apache#10084)
MINOR: KafkaBroker.brokerState should be volatile instead of
AtomicReference (apache#10080)
  ...

clients/src/main/java/org/apache/kafka/common/record/CompressionType.java
core/src/test/scala/unit/kafka/coordinator/group/GroupMetadataManagerTest.scala
@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Feb 14, 2021

We already have a RequestContext class, so another name would be RequestLocal (similar to ThreadLocal). Thoughts @chia7712?

@chia7712
Copy link
Copy Markdown
Member

We already have a RequestContext class, so another name would be RequestLocal (similar to ThreadLocal).

’RequestLocal’ is good to me

ijuma added 2 commits April 3, 2021 22:36
…e-allocations-lz4

* apache-github/trunk: (243 commits)
  KAFKA-12590: Remove deprecated kafka.security.auth.Authorizer, SimpleAclAuthorizer and related classes in 3.0 (apache#10450)
  KAFKA-3968: fsync the parent directory of a segment file when the file is created (apache#10405)
  KAFKA-12283: disable flaky testMultipleWorkersRejoining to stabilize build (apache#10408)
  MINOR: remove KTable.to from the docs (apache#10464)
  MONOR: Remove redudant LocalLogManager (apache#10325)
  MINOR: support ImplicitLinkedHashCollection#sort (apache#10456)
  KAFKA-12587 Remove KafkaPrincipal#fromString for 3.0 (apache#10447)
  KAFKA-12426: Missing logic to create partition.metadata files in RaftReplicaManager (apache#10282)
  MINOR: Improve reproducability of raft simulation tests (apache#10422)
  KAFKA-12474: Handle failure to write new session keys gracefully (apache#10396)
  KAFKA-12593: Fix Apache License headers (apache#10452)
  MINOR: Fix typo in MirrorMaker v2 documentation (apache#10433)
  KAFKA-12600: Remove deprecated config value `default` for client config `client.dns.lookup` (apache#10458)
  KAFKA-12952: Remove deprecated LogConfig.Compact (apache#10451)
  Initial commit (apache#10454)
  KAFKA-12575: Eliminate Log.isLogDirOffline boolean attribute (apache#10430)
  KAFKA-8405; Remove deprecated `kafka-preferred-replica-election` command (apache#10443)
  MINOR: Fix docs for end-to-end record latency metrics (apache#10449)
  MINOR Replaced File with Path in LogSegmentData. (apache#10424)
  KAFKA-12583: Upgrade netty to 4.1.62.Final
  ...
@ijuma ijuma force-pushed the reduce-produce-allocations-lz4 branch 2 times, most recently from ab60692 to 3e5b48a Compare April 5, 2021 14:53
@ijuma ijuma force-pushed the reduce-produce-allocations-lz4 branch from 3e5b48a to adac142 Compare April 5, 2021 14:55
@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Apr 5, 2021

@chia7712 I introduced RequestLocal as discussed. Does this seem reasonable to you? If so, I propose the following next steps:

  1. In this PR, provide utility methods in RequestLocal for the two common defaults: ThreadLocalCaching and NoCaching. The latter should be used when the usage is not guaranteed to be within the same thread. In the future, we can consider a ThreadSafeCaching/GlobalCaching option, if that makes sense.

  2. In a separate PR, remove the default arguments. This will result in a lot of test changes, but no change in behavior. So, it probably makes sense to review separately.

Thoughts?

@chia7712
Copy link
Copy Markdown
Member

chia7712 commented Apr 6, 2021

I introduced RequestLocal as discussed. Does this seem reasonable to you? If so, I propose the following next steps:

That LGTM. For another, the memory utils used by KafkaRequestHandler (BufferSupplier) is different from Processor (MemoryPool). Could we unify the interface? It seems to me RequestLocal can be applied to Processor as well in the future.

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Apr 6, 2021

Could we unify the interface?

@chia7712 Yes, it's worth exploring. I think MemoryPool is intended to be a thread-safe cache, so it's not trivial, but it may be possible. Are you interested in looking into that?

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Apr 6, 2021

@chia7712 I updated the PR. Please review when you have a chance.

@chia7712
Copy link
Copy Markdown
Member

chia7712 commented Apr 7, 2021

Yes, it's worth exploring. I think MemoryPool is intended to be a thread-safe cache, so it's not trivial, but it may be possible. Are you interested in looking into that?

yes, that is an interesting issue to me. I file a JIRA (https://issues.apache.org/jira/browse/KAFKA-12627). That can be a follow-up of this PR.

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma thanks for this improvement. overall +1. Could you run performance test?

import org.apache.kafka.common.utils.BufferSupplier

object RequestLocal {
val NoCaching: RequestLocal = RequestLocal(BufferSupplier.create())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoCaching should use BufferSupplier.NO_CACHING, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

apis.handle(request, requestLocal)
} catch {
case e: FatalExitError =>
shutdownComplete.countDown()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call completeShutdown() rather than shutdownComplete.countDown()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented Apr 7, 2021

@chia7712 When you say performance test, you mean the ducktape ones?

@chia7712
Copy link
Copy Markdown
Member

chia7712 commented Apr 7, 2021

When you say performance test, you mean the ducktape ones?

yep. maybe benchmark_test.py (although I feel the benefit gets more obvious in long-run production)

ijuma added 2 commits May 15, 2021 18:42
…e-allocations-lz4

* apache-github/trunk: (155 commits)
  KAFKA-12728: Upgrade gradle to 7.0.2 and shadow to 7.0.0 (apache#10606)
  KAFKA-12778: Fix QuorumController request timeouts and electLeaders (apache#10688)
  KAFKA-12754: Improve endOffsets for TaskMetadata (apache#10634)
  Rework on KAFKA-3968: fsync the parent directory of a segment file when the file is created (apache#10680)
  MINOR: set replication.factor to 1 to make StreamsBrokerCompatibilityService work with old broker (apache#10673)
  MINOR: prevent cleanup() from being called while Streams is still shutting down (apache#10666)
  KAFKA-8326: Introduce List Serde (apache#6592)
  KAFKA-12697: Add Global Topic and Partition count metrics to the Quorum Controller (apache#10679)
  KAFKA-12648: MINOR - Add TopologyMetadata.Subtopology class for subtopology metadata (apache#10676)
  MINOR: Update jacoco to 0.8.7 for JDK 16 support (apache#10654)
  MINOR: exclude all `src/generated` and `src/generated-test` (apache#10671)
  KAFKA-12772: Move all transaction state transition rules into their states (apache#10667)
  KAFKA-12758 Added `server-common` module to have server side common classes.  (apache#10638)
  MINOR Removed copying storage libraries specifically as they are already copied. (apache#10647)
  KAFKA-5876: KIP-216 Part 4, Apply InvalidStateStorePartitionException for Interactive Queries (apache#10657)
  KAFKA-12747: Fix flakiness in shouldReturnUUIDsWithStringPrefix (apache#10643)
  MINOR: remove unnecessary placeholder from WorkerSourceTask#recordSent (apache#10659)
  MINOR: Remove unused `scalatest` definition from `dependencies.gradle` (apache#10655)
  MINOR: checkstyle version upgrade: 8.20 -> 8.36.2 (apache#10656)
  KAFKA-12464: minor code cleanup and additional logging in constrained sticky assignment (apache#10645)
  ...
…e-allocations-lz4

* apache-github/trunk: (43 commits)
  KAFKA-12800: Configure generator to fail on trailing JSON tokens (apache#10717)
  MINOR: clarify message ordering with max in-flight requests and idempotent producer (apache#10690)
  MINOR: Add log identifier/prefix printing in Log layer static functions (apache#10742)
  MINOR: update java doc for deprecated methods (apache#10722)
  MINOR: Fix deprecation warnings in SlidingWindowedCogroupedKStreamImplTest (apache#10703)
  KAFKA-12499: add transaction timeout verification (apache#10482)
  KAFKA-12620 Allocate producer ids on the controller (apache#10504)
  MINOR: Kafka Streams code samples formating unification (apache#10651)
  KAFKA-12808: Remove Deprecated Methods under StreamsMetrics (apache#10724)
  KAFKA-12522: Cast SMT should allow null value records to pass through (apache#10375)
  KAFKA-12820: Upgrade maven-artifact dependency to resolve CVE-2021-26291
  HOTFIX: fix checkstyle issue in KAFKA-12697
  KAFKA-12697: Add OfflinePartitionCount and PreferredReplicaImbalanceCount metrics to Quorum Controller (apache#10572)
  KAFKA-12342: Remove MetaLogShim and use RaftClient directly (apache#10705)
  KAFKA-12779: KIP-740, Clean up public API in TaskId and fix TaskMetadata#taskId() (apache#10735)
  KAFKA-12814: Remove Deprecated Method StreamsConfig getConsumerConfigs (apache#10737)
  MINOR: Eliminate redundant functions in LogTest suite (apache#10732)
  MINOR: Remove unused maxProducerIdExpirationMs parameter in Log constructor (apache#10723)
  MINOR: Updating files with release 2.7.1 (apache#10660)
  KAFKA-12809: Remove deprecated methods of Stores factory (apache#10729)
  ...
@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented May 29, 2021

This PR:

test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   1 minute 16.243 seconds
{"records_per_sec": 2044571.6622, "mb_per_sec": 194.9855}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   1 minute 19.227 seconds
{"records_per_sec": 1779992.88, "mb_per_sec": 169.7533}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   1 minute 13.064 seconds
{"producer": {"records_per_sec": 402868.423173, "mb_per_sec": 38.42}, "consumer": {"records_per_sec": 408363.28, "mb_per_sec": 38.9446}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   1 minute 12.112 seconds
{"producer": {"records_per_sec": 347886.588972, "mb_per_sec": 33.18}, "consumer": {"records_per_sec": 352534.7247, "mb_per_sec": 33.6203}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   51.120 seconds
{"latency_50th_ms": 0.0, "latency_99th_ms": 3.0, "latency_999th_ms": 8.0}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   45.992 seconds
{"latency_50th_ms": 0.0, "latency_99th_ms": 3.0, "latency_999th_ms": 9.0}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   1 minute 11.957 seconds
{"0": {"records_per_sec": 400994.466276, "mb_per_sec": 38.24}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   1 minute 12.859 seconds
{"0": {"records_per_sec": 366716.784627, "mb_per_sec": 34.97}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=lz4.message_size=10
status:     PASS
run time:   55.828 seconds
{"records_per_sec": 1101318.782309, "mb_per_sec": 10.5}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=lz4.message_size=100
status:     PASS
run time:   44.917 seconds
{"records_per_sec": 373345.479833, "mb_per_sec": 35.6}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=lz4.message_size=1000
status:     PASS
run time:   45.609 seconds
{"records_per_sec": 63912.857143, "mb_per_sec": 60.95}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=lz4.message_size=10000
status:     PASS
run time:   45.665 seconds
{"records_per_sec": 8099.57755, "mb_per_sec": 77.24}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=lz4.message_size=100000
status:     PASS
run time:   42.768 seconds
{"records_per_sec": 1127.731092, "mb_per_sec": 107.55}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=zstd.message_size=10
status:     PASS
run time:   55.771 seconds
{"records_per_sec": 1051286.284953, "mb_per_sec": 10.03}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=zstd.message_size=100
status:     PASS
run time:   48.832 seconds
{"records_per_sec": 331319.921007, "mb_per_sec": 31.6}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=zstd.message_size=1000
status:     PASS
run time:   46.853 seconds
{"records_per_sec": 67615.617128, "mb_per_sec": 64.48}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=zstd.message_size=10000
status:     PASS
run time:   43.870 seconds
{"records_per_sec": 7647.293447, "mb_per_sec": 72.93}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.compression_type=zstd.message_size=100000
status:     PASS
run time:   43.045 seconds
{"records_per_sec": 1222.222222, "mb_per_sec": 116.56}
--------------------------------------------------------------------------------

trunk:

test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   1 minute 20.214 seconds
{"records_per_sec": 1927153.5941, "mb_per_sec": 183.7877}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   1 minute 16.091 seconds
{"records_per_sec": 1754693.8059, "mb_per_sec": 167.3406}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   1 minute 3.464 seconds
{"producer": {"records_per_sec": 472009.817804, "mb_per_sec": 45.01}, "consumer": {"records_per_sec": 480676.7929, "mb_per_sec": 45.8409}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_and_consumer.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   1 minute 12.971 seconds
{"producer": {"records_per_sec": 346392.323946, "mb_per_sec": 33.03}, "consumer": {"records_per_sec": 350741.8189, "mb_per_sec": 33.4493}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   48.883 seconds
{"latency_999th_ms": 9.0, "latency_99th_ms": 3.0, "latency_50th_ms": 0.0}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   47.810 seconds
{"latency_999th_ms": 9.0, "latency_99th_ms": 3.0, "latency_50th_ms": 0.0}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=lz4
status:     PASS
run time:   1 minute 6.919 seconds
{"0": {"records_per_sec": 426894.34365, "mb_per_sec": 40.71}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=zstd
status:     PASS
run time:   1 minute 8.106 seconds
{"0": {"records_per_sec": 369617.445943, "mb_per_sec": 35.25}}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=lz4.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=10
status:     PASS
run time:   54.631 seconds
{"records_per_sec": 1138306.504961, "mb_per_sec": 10.86}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=lz4.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=100
status:     PASS
run time:   49.852 seconds
{"records_per_sec": 429909.352979, "mb_per_sec": 41.0}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=lz4.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=1000
status:     PASS
run time:   46.940 seconds
{"records_per_sec": 67276.691729, "mb_per_sec": 64.16}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=lz4.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=10000
status:     PASS
run time:   44.910 seconds
{"records_per_sec": 8114.26844, "mb_per_sec": 77.38}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=lz4.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=100000
status:     PASS
run time:   45.013 seconds
{"records_per_sec": 1166.956522, "mb_per_sec": 111.29}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=zstd.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=10
status:     PASS
run time:   55.821 seconds
{"records_per_sec": 1123630.975303, "mb_per_sec": 10.72}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=zstd.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=100
status:     PASS
run time:   46.809 seconds
{"records_per_sec": 342043.068298, "mb_per_sec": 32.62}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=zstd.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=1000
status:     PASS
run time:   44.890 seconds
{"records_per_sec": 63012.676056, "mb_per_sec": 60.09}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=zstd.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=10000
status:     PASS
run time:   48.846 seconds
{"records_per_sec": 7501.9564, "mb_per_sec": 71.54}
--------------------------------------------------------------------------------
test_id:    kafkatest.benchmarks.core.benchmark_test.Benchmark.test_producer_throughput.acks=1.compression_type=zstd.security_protocol=PLAINTEXT.topic=topic-replication-factor-three.message_size=100000
status:     PASS
run time:   46.991 seconds
{"records_per_sec": 1142.12766, "mb_per_sec": 108.92}
--------------------------------------------------------------------------------

@ijuma
Copy link
Copy Markdown
Member Author

ijuma commented May 30, 2021

The results are similar for the ducktape benchmarks since the bottleneck is elsewhere. In the PR description, I include the results for a workload that shows significant improvement with these changes. Also, the following allocation profiles show that the the lz4 buffer allocations dominate trunk and are gone in this PR:

trunk:
image

this PR:
image

So, I think we can go ahead and merge this.

@ijuma ijuma merged commit 6b005b2 into apache:trunk May 30, 2021
@ijuma ijuma deleted the reduce-produce-allocations-lz4 branch May 30, 2021 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants