Skip to content

quiche: implement quic_port_utils#6488

Merged
alyssawilk merged 17 commits intoenvoyproxy:masterfrom
danzh2010:port_util
Apr 11, 2019
Merged

quiche: implement quic_port_utils#6488
alyssawilk merged 17 commits intoenvoyproxy:masterfrom
danzh2010:port_util

Conversation

@danzh2010
Copy link
Copy Markdown
Contributor

Implement QuicPickUnusedPortOrDie() and QuicRecyclePort() backed by Envoy::Network::Test::findOrCheckFreePort().
Added include_prefix to envoy_cc_test_library argument list and pass it down. So that quiche test only impl's can be included by api header files with relative #include path.

Risk Level: low, not in use
Testing: added new tests in test/extensions/quic_listeners/quiche/platform/quic_platform_test.cc
Part of #2557

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @wu-bin

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🙀 Error while processing event:

evaluation error
error: function http error: Post https://circleci.com/api/v1.1/project/github/envoyproxy/envoy/193731/retry?circle-token=57b4b7966a75db7541e68405ceffb9ffae846ef9: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
🐱

Caused by: a #6488 (comment) was created by @danzh2010.

see: more, trace.

)

envoy_cc_test_library(
name = "quic_platform_port_utils_impl_lib",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this library to //test? That way the source code in this library will be excluded from the coverage ci.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its header file has to be in the same directory as other impl files unless we want to config the gen rule to change the #include path in its API file which doesn't seem nice to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

const int rc = ::bind(fd, reinterpret_cast<const sockaddr*>(&ip_.ipv4_.address_),
sizeof(ip_.ipv4_.address_));
return {rc, errno};
auto& os_syscalls = Api::OsSysCallsSingleton::get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this PR? Or is it a drive-by fix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's adding a mock-able layer of system calls for testing.

return port;
}
}
RELEASE_ASSERT(false, "Failed to pick a port for test.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just ASSERT? I think in dbg mode we also want it to die if it can't pick a unused port.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RELEASE_ASSERT die in both case, but ASSERT doesn't die in opt mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I got it reversed, sorry.

// Only try to get a port for limited times.
std::vector<Envoy::Network::Address::IpVersion> supported_versions =
Envoy::TestEnvironment::getIpVersionsForTest();
for (size_t i = 0; i < 30000; ++i) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30000 iterations looks very high:) I think O(10) is enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other platforms, we use similar large numbers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go smaller here as well if for no other reason than I trust CircleCi less than blaze and chrome's test frameworks :-/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce to 300

// Fail bind call's to mimic port exhaustion.
EXPECT_CALL(os_sys_calls, bind(_, _, _))
.WillRepeatedly(Return(Envoy::Api::SysCallIntResult{-1, EADDRINUSE}));
EXPECT_DEATH(QuicPickUnusedPortOrDie(), "Failed to pick a port for test.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #6393 I've changed

  1. QuicPlatformTest from TEST to TEST_F.
  2. EXPECT_DEATH to EXPECT_DEATH_LOG_TO_STDERR.

Please sync and do the same for the new tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
wu-bin
wu-bin previously approved these changes Apr 8, 2019
Copy link
Copy Markdown
Contributor

@wu-bin wu-bin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest
/assign @alyssawilk

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6488 (comment) was created by @danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

SslIpVersionsClientType/GrpcSslClientIntegrationTest.BasicSslRequestWithClientCert/IPv4_EnvoyGrpc failed coverage test. It shouldn't be related to this PR.

@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6488 (comment) was created by @danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #6488 (comment) was created by @danzh2010.

see: more, trace.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch can you glance at the bzl changes?

// Only try to get a port for limited times.
std::vector<Envoy::Network::Address::IpVersion> supported_versions =
Envoy::TestEnvironment::getIpVersionsForTest();
for (size_t i = 0; i < 30000; ++i) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go smaller here as well if for no other reason than I trust CircleCi less than blaze and chrome's test frameworks :-/

break;
}
if (port == 0) {
// Just get a port from findOrCheckFreePort(), check its usability in the rest address
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest -> other
(or rest of)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (size_t i = 0; i < 30000; ++i) {
uint32_t port = 0;
bool available = true;
for (auto ip_version : supported_versions) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably know this offhand, but do we need to loop here? If we're listening on ipv6 inaddr any, that covers ipv4 over mapped addresses but I'm not sure if listening on 127.0.0.1:444 and then ::444 would fail on the platforms we support.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I'm not sure if listening on 127.0.0.1:444 and then ::444 would fail on the platforms we support.
What do you mean?

The returned port of this function should be able to bind to both an v4 and v6 address (in separate test) if test supports both address families. So here we need check if we can create a socket and bind it to an address in both v4 and v6.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the version loop with checking on single version according to our discussion. Thanks for the suggestion!

@alyssawilk
Copy link
Copy Markdown
Contributor

FWIW if your coverage fail was gRPC I think that's fixed at HEAD, so a master merge may help

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment request, otherwise LGTM

std::vector<Envoy::Network::Address::IpVersion> supported_versions =
Envoy::TestEnvironment::getIpVersionsForTest();
ASSERT(!supported_versions.empty());
Envoy::Network::Address::IpVersion ip_version = supported_versions.size() == 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment as to why this works?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented a few lines below inside for loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I think it would be more useful here, if folks are reading top down though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Dan Zhang <danzh@google.com>
alyssawilk
alyssawilk previously approved these changes Apr 9, 2019
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build changes look good, but just a comment on the port picking.

if (addr_port != nullptr && addr_port->ip() != nullptr) {
// Find a port.
return addr_port->ip()->port();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@curiouserrandy has also written code like this, so might have comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a lot of hassle to do something the OS will do for you (i.e. pick a port if you bind port 0), and additionally racy as well. Can you give me a sense as to why you're doing it this way as opposed to using bind(..., 0) at point of use?

I also share Harvey's concern about mixing apparently production and apparently test code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my reply to Harvey's comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. Pity :-}. Next question: As I read the code (including findOrCheckFreePort) what is being done is that the address passed is being bound with port == 0, the port returned by the kernel is being returned, and the bound socket is being released. Why is that wrapped in a "try 300 times" loop? If a bind with port 0 fails, I'd think there's something badly wrong somewhere, and retrying wouldn't help.

(Meta-comment: I'm still in the "asking questions to understand the code" phase of the review, so if you and Harvey get to closure, don't block on me.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. Pity :-}. Next question: As I read the code (including findOrCheckFreePort) what is being done is that the address passed is being bound with port == 0, the port returned by the kernel is being returned, and the bound socket is being released. Why is that wrapped in a "try 300 times" loop? If a bind with port 0 fails, I'd think there's something badly wrong somewhere, and retrying wouldn't help.
Would temporary running out of port lead to bind failure on port 0? Or some other transient system failure?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the loop made sense when we were doing v4/v6 checks but now that we've done away with that I suspect we can kill the loop off entirely.

It could be transient but it's functional equivalent to sleep(1)-and-try-again so I think we can skip

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


namespace quic {

int QuicPickUnusedPortOrDieImpl() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test-only, right? I would ideally like to see a better naming or structuring to indicate code is for test vs. prod., since there are different requirements here as far as error handling and robustness. Are we restricted by the QUICHE layout?

Also, why does QUICHE do this? In Envoy, we just bind zero on the server side and then communicate the effective port OOB to the client in integration tests. Could QUICHE do this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test-only, right? I would ideally like to see a better naming or structuring to indicate code is for test vs. prod., since there are different requirements here as far as error handling and robustness. Are we restricted by the QUICHE layout?
Talked with Harvey about a reasonable approach, such test only code can be moved to /test, and be included by a dummy impl.h file in /source.

Also, why does QUICHE do this? In Envoy, we just bind zero on the server side and then communicate the effective port OOB to the client in integration tests. Could QUICHE do this?
Quic e2e tests doesn't have this server->client channel to communicate the port it picks after starting listening.

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

Moved the definitions to //test/ PTAL

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: build_image (failed build)
🔨 rebuilding ci/circleci: docker (failed build)
🔨 rebuilding ci/circleci: ipv6_tests (failed build)

🐱

Caused by: a #6488 (comment) was created by @danzh2010.

see: more, trace.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

@@ -0,0 +1,10 @@

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: superfluous blank line

Copy link
Copy Markdown
Contributor Author

@danzh2010 danzh2010 Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

// If it supports both v4 and v6, checking availability under v6 with IPV6_V6ONLY
// set to false is sufficient because such socket can be used on v4-mapped
// v6 address.
Envoy::Network::Address::IpVersion ip_version = supported_versions.size() == 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// consumed or referenced directly by other Envoy code. It serves purely as a
// porting layer for QUICHE.

#include "test/extensions/quic_listeners/quiche/platform/quic_port_utils_test_impl.h"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining this redirect for posterity? Thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@htuch
Copy link
Copy Markdown
Member

htuch commented Apr 10, 2019

Thanks @danzh2010, this is good to merge once CI passes.

@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: ipv6_tests (failed build)

🐱

Caused by: a #6488 (comment) was created by @danzh2010.

see: more, trace.

@alyssawilk alyssawilk merged commit b81ff99 into envoyproxy:master Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants