KAFKA-10074: Improve performance of matchingAcls#8769
Merged
ijuma merged 2 commits intoapache:trunkfrom Jun 1, 2020
Merged
Conversation
This PR reduces allocations by using a plain old `foreach` in `matchingAcls` and improves `AclSeqs.find` to only search the inner collections that are required to find a match (instead of searching all of them). A recent change (90bbeed) in `matchingAcls` to remove `filterKeys` in favor of filtering inside `flatMap` caused a performance regression in cases where there are large number of topics, prefix ACLs and TreeMap.from/to filtering is ineffective. In such cases, we rely on string comparisons to exclude entries from the ACL cache that are not relevant. This issue is not present in any release yet, so we should include the simple fix in the 2.6 branch. The original benchmark did not show a performance difference, so I adjusted the benchmark to stress the relevant code more. More specifically, `aclCacheSnapshot.from(...).to(...)` returns nearly 20000 entries where each map value contains 1000 AclEntries. Out of the 200k AclEntries, only 1050 are retained due to the `startsWith` filtering. This is the case where the implementation in master is least efficient when compared to the previous version and the version in this PR. The adjusted benchmark results for testAuthorizer are 4.532ms for master, 2.903ms for the previous version and 2.877ms for this PR. Normalized allocation rate was 593 KB/op for master, 597 KB/op for the previous version and 101 KB/s for this PR. Full results follow: master with adjusted benchmark: (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 680.805 ± 44.318 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 549.879 ± 36.259 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411457042.000 ± 4805.461 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 331.110 ± 95.821 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 247799480.320 ± 72877192.319 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.891 ± 3.183 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 667593.387 ± 2369888.357 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 28.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3458.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 4.532 ± 0.546 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 119.036 ± 14.261 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 593524.310 ± 22.452 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 117.091 ± 1008.188 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 598574.303 ± 5153905.271 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.034 ± 0.291 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 173.001 ± 1489.593 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time master with filterKeys like 90bbeed and adjusted benchmark: AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 729.163 ± 20.842 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 513.005 ± 13.966 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411459778.400 ± 3178.045 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 307.041 ± 94.544 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 246385400.686 ± 82294899.881 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 1.571 ± 2.590 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1258291.200 ± 2063669.849 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 33.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3266.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.903 ± 0.175 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 187.088 ± 11.301 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 597962.743 ± 14.237 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 118.602 ± 1021.202 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 383359.632 ± 3300842.044 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time 50 200000 avgt 5 14.000 ms This PR with adjusted benchmark: (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 706.774 ± 32.353 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 529.879 ± 25.416 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411458751.497 ± 4424.187 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 310.559 ± 112.310 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 241364219.611 ± 97317733.967 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen 50 200000 avgt 5 0.690 ± 5.937 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen.norm 50 200000 avgt 5 531278.507 ± 4574468.166 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 2.550 ± 17.243 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1969325.592 ± 13278191.648 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 32.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3489.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.877 ± 0.530 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 31.963 ± 5.912 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 101057.225 ± 9.468 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 ≈ 0 counts
chia7712
reviewed
Jun 1, 2020
| def find(p: AclEntry => Boolean): Option[AclEntry] = classes.flatMap(_.find(p)).headOption | ||
| def isEmpty: Boolean = !classes.exists(_.nonEmpty) | ||
| class AclSeqs(seqs: Seq[AclEntry]*) { | ||
| def find(p: AclEntry => Boolean): Option[AclEntry] = { |
Member
There was a problem hiding this comment.
Should it need comment to remind reader that this style is for optimization.
Member
Author
There was a problem hiding this comment.
I think this is obvious, no? find should generally short-circuit and not go through all the items. That's how it works for all collection implementations
I think this kind of comment makes sense in matchingAcls where I added one.
Member
Author
There was a problem hiding this comment.
@chia7712 I looked at the code again and I guess the intent may not be clear. I added a comment that hopefully clarifies.
rajinisivaram
approved these changes
Jun 1, 2020
Contributor
rajinisivaram
left a comment
There was a problem hiding this comment.
@ijuma Thanks for the PR, LGTM
Member
Author
|
The issue affecting the Scala 2.12 build is unrelated. Looks like Gradle exited while running Streams tests. |
ijuma
added a commit
that referenced
this pull request
Jun 1, 2020
This PR reduces allocations by using a plain old `foreach` in `matchingAcls` and improves `AclSeqs.find` to only search the inner collections that are required to find a match (instead of searching all of them). A recent change (90bbeed) in `matchingAcls` to remove `filterKeys` in favor of filtering inside `flatMap` caused a performance regression in cases where there are large number of topics, prefix ACLs and TreeMap.from/to filtering is ineffective. In such cases, we rely on string comparisons to exclude entries from the ACL cache that are not relevant. This issue is not present in any release yet, so we should include the simple fix in the 2.6 branch. The original benchmark did not show a performance difference, so I adjusted the benchmark to stress the relevant code more. More specifically, `aclCacheSnapshot.from(...).to(...)` returns nearly 20000 entries where each map value contains 1000 AclEntries. Out of the 200k AclEntries, only 1050 are retained due to the `startsWith` filtering. This is the case where the implementation in master is least efficient when compared to the previous version and the version in this PR. The adjusted benchmark results for testAuthorizer are 4.532ms for master, 2.903ms for the previous version and 2.877ms for this PR. Normalized allocation rate was 593 KB/op for master, 597 KB/op for the previous version and 101 KB/s for this PR. Full results follow: master with adjusted benchmark: ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 680.805 ± 44.318 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 549.879 ± 36.259 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411457042.000 ± 4805.461 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 331.110 ± 95.821 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 247799480.320 ± 72877192.319 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.891 ± 3.183 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 667593.387 ± 2369888.357 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 28.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3458.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 4.532 ± 0.546 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 119.036 ± 14.261 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 593524.310 ± 22.452 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 117.091 ± 1008.188 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 598574.303 ± 5153905.271 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.034 ± 0.291 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 173.001 ± 1489.593 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time 50 200000 avgt 5 13.000 ms ``` master with filterKeys like 90bbeed and adjusted benchmark: ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 729.163 ± 20.842 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 513.005 ± 13.966 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411459778.400 ± 3178.045 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 307.041 ± 94.544 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 246385400.686 ± 82294899.881 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 1.571 ± 2.590 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1258291.200 ± 2063669.849 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 33.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3266.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.903 ± 0.175 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 187.088 ± 11.301 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 597962.743 ± 14.237 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 118.602 ± 1021.202 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 383359.632 ± 3300842.044 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time 50 200000 avgt 5 14.000 ms ``` This PR with adjusted benchmark: ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 706.774 ± 32.353 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 529.879 ± 25.416 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411458751.497 ± 4424.187 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 310.559 ± 112.310 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 241364219.611 ± 97317733.967 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen 50 200000 avgt 5 0.690 ± 5.937 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen.norm 50 200000 avgt 5 531278.507 ± 4574468.166 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 2.550 ± 17.243 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1969325.592 ± 13278191.648 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 32.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3489.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.877 ± 0.530 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 31.963 ± 5.912 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 101057.225 ± 9.468 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 ≈ 0 counts ``` Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Chia-Ping Tsai <chia7712@gmail.com>
ijuma
added a commit
to confluentinc/kafka
that referenced
this pull request
Jun 3, 2020
* apache-github/2.6: (32 commits) KAFKA-10083: fix failed testReassignmentWithRandomSubscriptionsAndChanges tests (apache#8786) KAFKA-9945: TopicCommand should support --if-exists and --if-not-exists when --bootstrap-server is used (apache#8737) KAFKA-9320: Enable TLSv1.3 by default (KIP-573) (apache#8695) KAFKA-10082: Fix the failed testMultiConsumerStickyAssignment (apache#8777) MINOR: Remove unused variable to fix spotBugs failure (apache#8779) MINOR: ChangelogReader should poll for duration 0 for standby restore (apache#8773) KAFKA-10030: Allow fetching a key from a single partition (apache#8706) Kafka-10064 Add documentation for KIP-571 (apache#8760) MINOR: Code cleanup and assertion message fixes in Connect integration tests (apache#8750) KAFKA-9987: optimize sticky assignment algorithm for same-subscription case (apache#8668) KAFKA-9392; Clarify deleteAcls javadoc and add test for create/delete timing (apache#7956) KAFKA-10074: Improve performance of `matchingAcls` (apache#8769) KAFKA-9494; Include additional metadata information in DescribeConfig response (KIP-569) (apache#8723) KAFKA-10056; Ensure consumer metadata contains new topics on subscription change (apache#8739) KAFKA-10029; Don't update completedReceives when channels are closed to avoid ConcurrentModificationException (apache#8705) KAFKA-10061; Fix flaky `ReassignPartitionsIntegrationTest.testCancellation` (apache#8749) KAFKA-9130; KIP-518 Allow listing consumer groups per state (apache#8238) KAFKA-9501: convert between active and standby without closing stores (apache#8248) MINOR: Relax Percentiles test (apache#8748) MINOR: regression test for task assignor config (apache#8743) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR reduces allocations by using a plain old
foreachinmatchingAclsand improvesAclSeqs.findto only search the innercollections that are required to find a match (instead of searching all
of them).
A recent change (90bbeed) in
matchingAclsto removefilterKeysinfavor of filtering inside
flatMapcaused a performance regression incases where there are large number of topics, prefix ACLs and
TreeMap.from/to filtering is ineffective. In such cases, we rely on
string comparisons to exclude entries from the ACL cache that are not
relevant.
This issue is not present in any release yet, so we should include the
simple fix in the 2.6 branch.
The original benchmark did not show a performance difference, so I
adjusted the benchmark to stress the relevant code more. More
specifically,
aclCacheSnapshot.from(...).to(...)returns nearly 20000entries where each map value contains 1000 AclEntries. Out of the 200k
AclEntries, only 1050 are retained due to the
startsWithfiltering.This is the case where the implementation in master is least
efficient when compared to the previous version and the version in this
PR.
The adjusted benchmark results for testAuthorizer are 4.532ms for
master, 2.903ms for the previous version and 2.877ms for this PR.
Normalized allocation rate was 593 KB/op for master, 597 KB/op for the
previous version and 101 KB/s for this PR. Full results follow:
master with adjusted benchmark:
master with filterKeys like 90bbeed and adjusted benchmark:
This PR with adjusted benchmark:
Committer Checklist (excluded from commit message)