Skip to content

eds: decrease computational complexity of updates#11442

Merged
htuch merged 24 commits into
envoyproxy:masterfrom
pgenera:eds-nsquared
Jul 9, 2020
Merged

eds: decrease computational complexity of updates#11442
htuch merged 24 commits into
envoyproxy:masterfrom
pgenera:eds-nsquared

Conversation

@pgenera
Copy link
Copy Markdown
Contributor

@pgenera pgenera commented Jun 4, 2020

Commit Message: Makes BaseDynamicClusterImpl::updateDynamicHostList O(n) rather than O(n^2)
Additional Description: Instead of calling .erase() on list iterators as we find them, we swap with the end of the list and erase after iterating over the list. This shows a ~3x improvement in execution time in the included benchmark test.
Risk Level: Medium. No reordering happens to the endpoint list. Not runtime guarded.
Testing: New benchmark, existing unit tests pass (and cover the affected function).
Docs Changes: N/A
Release Notes: N/A

#2874 #11362

pgenera added 4 commits June 3, 2020 19:40
…impl.

Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Copy Markdown
Contributor Author

pgenera commented Jun 4, 2020

I'm happy to put this behind a runtime guard if it seems prudent.

@pgenera pgenera marked this pull request as ready for review June 4, 2020 18:21
@jmarantz jmarantz self-assigned this Jun 4, 2020
Comment thread source/common/upstream/upstream_impl.cc Outdated
Comment thread source/common/upstream/upstream_impl.cc Outdated
Comment thread source/common/upstream/upstream_impl.cc Outdated
Comment thread source/common/upstream/upstream_impl.cc Outdated
@mattklein123
Copy link
Copy Markdown
Member

I'm happy to put this behind a runtime guard if it seems prudent.

IMO it's OK to not have a runtime guard for this, but I would raise the regression risk to medium at least. @snowp can you also take a look at this?

Signed-off-by: Phil Genera <pgenera@google.com>
@htuch
Copy link
Copy Markdown
Member

htuch commented Jun 9, 2020

This should be rebased on #11505.

pgenera added 3 commits June 11, 2020 13:05
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Comment thread source/common/upstream/upstream_impl.cc Outdated
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the PR description a comparison of your speed test before/after your n^2 fix?

You'd have to patch the speed-test only into a different client.

Comment thread test/common/common/utility_test.cc Outdated
Comment thread test/common/common/utility_test.cc Outdated
Comment thread source/common/common/utility.h Outdated
Signed-off-by: Phil Genera <pgenera@google.com>
Comment thread test/common/common/utility_test.cc Outdated
Comment thread test/common/common/utility_test.cc Outdated
Comment thread test/common/common/utility_test.cc Outdated
Comment thread test/common/common/utility_test.cc Outdated
pgenera added 2 commits June 17, 2020 14:32
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Copy Markdown
Contributor Author

pgenera commented Jun 17, 2020

Can you add to the PR description a comparison of your speed test before/after your n^2 fix?

You'd have to patch the speed-test only into a different client.

Done, its still ~3.2x improvement over the baseline. Results are linked from the description; I can be more explicit than that if you'd like.

jmarantz
jmarantz previously approved these changes Jun 17, 2020
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you if you want to think about larger variable names.

This does need a @envoyproxy/senior-maintainers approval.

Comment thread source/common/upstream/upstream_impl.cc Outdated
@jmarantz
Copy link
Copy Markdown
Contributor

and thanks for doing this!

@pgenera
Copy link
Copy Markdown
Contributor Author

pgenera commented Jun 17, 2020

Can you add to the PR description a comparison of your speed test before/after your n^2 fix?
You'd have to patch the speed-test only into a different client.

Done, its still ~3.2x improvement over the baseline. Results are linked from the description; I can be more explicit than that if you'd like.

After a bit of thought I noticed the performance of priorityAndLocalWeighted is about ~30% slower. Notably those tests don't exercise any of the n^2 logic (eg, with one iteration none of those .erase() calls happen), but I'm still surprised that the visitor-predicate-pattern is measurably slower than iterating in situ. Even with this mysterious observation, I think a 320% improvement in what I think is the common case is worth a 30% worse performance on the first iteration.

@jmarantz
Copy link
Copy Markdown
Contributor

Quick check: are you comparing optimized runs? Without optimization, inlining, and collapsing of dead logic I could see this being a significant performance degradation.

@jmarantz
Copy link
Copy Markdown
Contributor

I see in your comment you did use -c opt. It's probably worth an iteration with cachegrind or callgrind focusing on the troublesome use-case to see just what's up. Possibly the lambda context adds a level of indirection through a generated structure that might not be possible to fully optimize away.

If the absolute per-itereration perf penalty is not too great it might be fine to just explain that, maybe as a comment in the code for posterity.

Signed-off-by: Phil Genera <pgenera@google.com>
jmarantz
jmarantz previously approved these changes Jun 29, 2020
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Praying to the gods of clang it goes through!

@envoyproxy/senior-maintainers

Comment thread test/benchmark/main.cc Outdated
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Copy Markdown
Contributor Author

pgenera commented Jun 29, 2020

Great, thanks! Praying to the gods of clang it goes through!

I do not think we have been smiled upon :D. I'll be out the rest of this week, but will poke my head in late tonight in case there's something easy to do.

Comment thread test/benchmark/main.cc Outdated
@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jul 2, 2020

also merge master to hopefully pick up a fix that was made to the http2 integration test for tsan.

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jul 2, 2020

/wait

pgenera added 2 commits July 6, 2020 13:00
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Copy Markdown
Contributor Author

pgenera commented Jul 7, 2020

also merge master to hopefully pick up a fix that was made to the http2 integration test for tsan.

Done and done. And it appears to have helped!

jmarantz
jmarantz previously approved these changes Jul 8, 2020
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@envoyproxy/senior-maintainers

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a nit.
/wait

Comment thread test/benchmark/main.h
Signed-off-by: Phil Genera <pgenera@google.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pgenera
Copy link
Copy Markdown
Contributor Author

pgenera commented Jul 8, 2020

Looking through the CI failures:

  • coverage: [ FAILED ] IpVersionsClientType/HdsIntegrationTest.SingleEndpointUnhealthyHttp/5, where GetParam() = (4-byte object <00-00 00-00>, 4-byte object <01-00 00-00>, 0): unrelated
  • windows: //test/extensions/filters/http/router:auto_sni_integration_test FAILED: unrelated

Both of these have high-flakiness warnings when I look at them in azure. They all (of course) pass locally and with RBE.

Signed-off-by: Phil Genera <pgenera@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jul 9, 2020

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@htuch htuch merged commit b1e62a3 into envoyproxy:master Jul 9, 2020
@pgenera pgenera deleted the eds-nsquared branch July 13, 2020 16:07
Comment thread test/benchmark/main.cc
return 0;
}

skip_expensive_benchmarks = skip_switch.getValue();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some big nice WARNING when this flag is enabled in order to increase the chances of someone noticing the difference between envoy_cc_benchmarks and tests for those benchmarks?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #12121

scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
Makes BaseDynamicClusterImpl::updateDynamicHostList O(n) rather than O(n^2)

Instead of calling .erase() on list iterators as we find them, we swap with the end of the list and erase after iterating over the list. This shows a ~3x improvement in execution time in the included benchmark test.

Risk Level: Medium. No reordering happens to the endpoint list. Not runtime guarded.
Testing: New benchmark, existing unit tests pass (and cover the affected function).
Docs Changes: N/A
Release Notes: N/A

Relates to envoyproxy#2874 envoyproxy#11362

Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants