Skip to content

Config switch to turn on/off Quic processing#9679

Merged
mattklein123 merged 67 commits intoenvoyproxy:masterfrom
nezdolik:quic-switch
May 15, 2020
Merged

Config switch to turn on/off Quic processing#9679
mattklein123 merged 67 commits intoenvoyproxy:masterfrom
nezdolik:quic-switch

Conversation

@nezdolik
Copy link
Copy Markdown
Member

@nezdolik nezdolik commented Jan 14, 2020

Signed-off-by: Kateryna Nezdolii nezdolik@spotify.com

Description:
Extending quic listener configuration with runtime feature flag for enabling/disabling quic processing. Flag controls onReadReady() and onWriteReady() callback methods and is enabled by default.
Risk Level: Low/Medium
Testing: Unit tests+Integration tests.
Docs Changes: NA?
Release Notes:
Fixes ##9230

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #9679 was opened by nezdolik.

see: more, trace.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @alyssawilk @danzh2010 @mattklein123 this is initial draft for config layout.

Comment thread api/envoy/api/v2/listener/udp_listener_config.proto Outdated
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Some part of me wants to encourage one big enum (since when we disable QUIC we make the shutdown mode call based on the reason we're shutting down) but honestly this is more clear and will hopefully avoid any decision paralysis when dealing with production fires. And folks can still push a different shut down mode with their disable if their default doesn't match their disasater :-)

@mattklein123 mattklein123 self-assigned this Jan 16, 2020
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Flushing out a few comments.

/wait

Comment thread api/envoy/api/v2/listener/udp_listener_config.proto Outdated
Comment on lines 41 to 43
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm a little confused about what is being proposed here. Do you mind adding some more comments?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 @htuch for turning off/on udp/quick processing, my proposal is to have boolean switch on udp listener config + shutdown mode field (for shutdown type). Submitted this draft config to get community agreement on the approach, code will follow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just remove the listener and drain as normal? Trying to grok what is distinct here vs. any other listener that comes to life at some point, starts accepting, and then is removed and drained.

Copy link
Copy Markdown
Member Author

@nezdolik nezdolik Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch thanks for suggestion, i did not work much with LDS, so missed that opportunity. Agree that this is cleaner way, however currently listener manager supports only draining (graceful shutdown) of active listeners:
https://github.com/envoyproxy/envoy/blob/master/source/server/listener_manager_impl.cc#L668
In case we go with LDS approach, it will require code change to support multiple draining modes, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends how we structure shared listeners.
I thought if we had QUIC and QUIC-proxy listening on UDP/443 they'd share a listener, and some higher level filter would determine which packets went where. In that case as said it'd be nice to able to disable them one at a time. We can deal without but it's a nice to have.
If we think we'd have a QUIC/443 listener and a QUIC-proxy/443 listener just sharing incoming traffic via some combination of SO_REUSEPORT and BPF this would be fine.
I'd assumed we were going to do more of the former, at which point a non-listener kill switch has value IMO (though again we can live without it)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My actual assumption (potentially naive) was that QUIC and QUIC-proxy would probably be different deployments of the binary in different auto scaling groups, etc. What is the use case in which you envision the same process doing both?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic example is for edge proxies with low-value certs that do transparent proxying for latency gains. QUIC-terminate locally served content, UDP proxy where you'd TCP proxy because DNS SRV doesn't work.
We have two different frontline deployments which do TLS / TCP proxy based on VIP and one of them did QUIC+Udp-proxy for years.
We haven't had an issue where we needed to turn off one and not the other yet, but as tunneling over QUIC picks up steam (and that team has agreed to OSS once we land core QUIC) I think it gets more useful to turn different modes down separately.
We can always start simple and add more later though!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always start simple and add more later though!

That would be my feeling. Right now we don't have any filter chain concept for UDP (I would like to fix that as part of a general listener API cleanup), and without that I'm not sure how we would do something like ^ today. IMO right now I think we could just have a per-listener runtime kill switch for UDP listeners? Or even just for the QUIC listener config itself?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback, it's valuable information to get familiar with. So we agreed to introduce separate config switches for upd and quic listeners.

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
ActiveRawUdpListenerConfigFactory::createActiveUdpListenerFactory(
const Protobuf::Message& /*message*/) {
const Protobuf::Message& message) {
auto& config =
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now this cast will not work, need to reorg udp listener proto config to have one top level message

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Flushing out a few more comments.

/wait

google.protobuf.Duration crypto_handshake_timeout = 3;
}

message ActiveQuicListenerConfig {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just name this QuicListenerConfig. I think we can drop the Active (having it in the other place is a mistake IMO).

In terms of the shutoff, what I'm proposing is just having a RuntimeFeatureFlag which says whether to process packets or not. I think this is ultimately the kill switch we are looking for? @alyssawilk @danzh2010 WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think it's not the switch I'm looking for in the long term (because not granular enough) but I'm fine landing whatever controls we think are useful today and we can add the finicky ones once we're running in prod and care more :-)

}

message ActiveRawUdpListenerConfig {
enum UdpShutDownMode {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely positive we want to bother with this kill switch for raw UDP right now. Maybe defer that for later? If we do it here, we should de-dup the messages and have a common configuration message of some type.

Kateryna Nezdolii added 3 commits January 31, 2020 11:11
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@stale
Copy link
Copy Markdown

stale Bot commented Feb 3, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale Bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 3, 2020
Kateryna Nezdolii added 3 commits February 3, 2020 17:07
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@stale stale Bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 4, 2020
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Copy Markdown
Member Author

nezdolik commented Feb 4, 2020

Spent most of the time on task, which may be out of scope for this change, basically trying to propagate Envoy primitives (like Runtime) all the way down from server instance->worker->connection handler->active quic listener factory->active quic listener, which introduced cyclic dependency in bazel modules. As current code is not production ready (and ActiveQuicListenerFactory not being instantiated from conn_handler), change may be fine as it is (adding Runtime field to ActiveQuicListenerFactory), but in the long run code needs to be refactored, possibly introducing UdpListenerFactoryContext that would provide to resources like Dispatcher, Runtime::Loader etc.
Still need to complete tests.
Would appreciate opinions if refactoring change should be part of this task or not.

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@mattklein123
Copy link
Copy Markdown
Member

Check out Runtime::LoaderSingleton::getExisting() which will probably make this task easier. This is one case where we break our rule about no statics.

@nezdolik
Copy link
Copy Markdown
Member Author

@danzh2010 switched to real Runtime object and added end to end test for empty config. Please take a look, i believe it addresses your review comments.

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few nits!

quic::QuicBufferedPacketStore* const buffered_packets =
quic::test::QuicDispatcherPeer::GetBufferedPackets(quic_dispatcher_);
configureQuicRuntimeFlag(/* runtime_enabled = */ true);
// configureQuicRuntimeFlag(/* runtime_enabled = */ true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread bazel/envoy_internal.bzl Outdated
"-Wall",
"-Wextra",
"-Werror",
#"-Werror",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Copy Markdown
Member Author

@nezdolik nezdolik May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danzh2010 i was getting complaint (gcc) for quiche library code:

bazel-out/k8-fastbuild/bin/external/com_googlesource_quiche/quiche/quic/core/quic_framer.cc: In member function 'bool quic::QuicFramer::AppendIetfAckFrameAndTypeByte(const quic::QuicAckFrame&, quic::QuicDataWriter*)':
bazel-out/k8-fastbuild/bin/external/com_googlesource_quiche/quiche/quic/core/quic_framer.cc:5434:40: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
         writer->remaining() - ecn_size <
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
             QuicDataWriter::GetVarInt62Len(gap) +
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 QuicDataWriter::GetVarInt62Len(ack_range)) {
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing this issue in quiche now. It will be updated soon.

void resumeListening() override;
void shutdownListener() override;

bool enabled() { return enabled_.enabled(); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't strongly feel this interface is needed. Can you inline it at call site? If it's only used in test, accessing via a peer class is better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danzh2010 alright, just looked up that method in other envoy modules and followed by example.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the only definition of quic listener being enabled. I would prefer not to add this interface.

Comment on lines +89 to +91
quic_listener_ = std::make_unique<ActiveQuicListener>(*dispatcher_, connection_handler_,
listen_socket_, listener_config_,
quic_config_, nullptr, enabledFlag());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding enabledFlag() is a bit indirect. Can you add two methods like below:

std::make_unique<ActiveQuicListener> createQuicListenerFactory(const std::string& yaml) {
std::string listener_name = QuicListenerName;
    auto& config_factory =
        Config::Utility::getAndCheckFactoryByName<Server::ActiveUdpListenerConfigFactory>(
            listener_name);
    ProtobufTypes::MessagePtr config = config_factory.createEmptyConfigProto();
    TestUtility::loadFromYaml(yaml, *config);
    return config_factory.createActiveUdpListenerFactory(*config, /*concurrency=*/1);

virtual std::string yamlForQuicConfig() {
  return  R"EOF(
runtime_key: "quic.enabled"
default_value: true
)EOF";
}

And use them here:

EXPECT_CALL(listener_config_, listenSocketFactory());
EXPECT_CALL(listener_config_. socket_factory_, getListenSocket).WillOnce(Return(listen_socket_));
quic_listener_ = createQuicListenerFactory(yamlForQuicConfig())->createActiveUdpListener(*dispatcher_, connection_handler_, ...);

In ActiveQuicListenerEmptyFlagConfigTest,

std::string yamlForQuicConfig() override {
 return R"EOF(
    max_concurrent_streams: 10
  )EOF";
}

Comment thread test/extensions/quic_listeners/quiche/active_quic_listener_test.cc
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-watchers: FYI only for changes made to api/.

🐱

Caused by: #9679 was synchronize by nezdolik.

see: more, trace.

Kateryna Nezdolii added 3 commits May 6, 2020 23:28
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>

quic_listener_ = std::make_unique<ActiveQuicListener>(
*dispatcher_, connection_handler_, listen_socket_, listener_config_, quic_config_, nullptr);
EXPECT_CALL(listener_config_, listenSocketFactory())
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed to wire up mocks, that are used inside listener factory when creating quic listener

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this type of thing it's better to use ON_CALL(...).WillByDefault()

Kateryna Nezdolii added 10 commits May 7, 2020 00:06
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Copy Markdown
Member Author

nezdolik commented May 8, 2020

There is failing coverage: https://343614-65214191-gh.circle-artifacts.com/0/coverage/index.html but none if it is related to this PR. Going to merge master and see if it helps

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than small nits. @danzh2010 can you take a final pass? Thank you!

/wait

Network::ListenerConfig& listener_config, const quic::QuicConfig& quic_config,
Network::Socket::OptionsSharedPtr options);
Network::Socket::OptionsSharedPtr options,
const envoy::config::core::v3::RuntimeFeatureFlag enabled);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pass by const ref here and below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


quic_listener_ = std::make_unique<ActiveQuicListener>(
*dispatcher_, connection_handler_, listen_socket_, listener_config_, quic_config_, nullptr);
EXPECT_CALL(listener_config_, listenSocketFactory())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this type of thing it's better to use ON_CALL(...).WillByDefault()

danzh2010
danzh2010 previously approved these changes May 13, 2020
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a few nits in test.

ActiveQuicListenerFactory::ActiveQuicListenerFactory(
const envoy::config::listener::v3::QuicProtocolOptions& config, uint32_t concurrency)
: concurrency_(concurrency) {
: concurrency_(concurrency), enabled_(config.enabled()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

ReadFromClientSockets();
}

TEST_P(ActiveQuicListenerTest, QuicProcessingDisabled) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is already tested in QuicProcessingDisabledAndEnabled, right? Maybe remove it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread test/extensions/quic_listeners/quiche/active_quic_listener_test.cc
Kateryna Nezdolii added 2 commits May 14, 2020 18:02
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Copy Markdown
Member Author

FAILED_PRECONDITION: there are no bots capable of executing.... in ci, need to wait for fix...

Kateryna Nezdolii added 2 commits May 15, 2020 10:26
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Copy Markdown
Member Author

@danzh2010 @mattklein123 applied review comments.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants