vhds: add support for VHDS in a static route-configuration#43852
vhds: add support for VHDS in a static route-configuration#43852adisuissa wants to merge 8 commits into
Conversation
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
|
Requires senior maintainer review, assigning Yan. |
botengyao
left a comment
There was a problem hiding this comment.
lgtm module two questions. Most of the logic is similar to the rds_impl.
/wait
| // Emulate a config-update information gathering using a dynamic RouteConfigurationReceiver. | ||
| config_update_info_ = std::make_unique<RouteConfigUpdateReceiverImpl>( | ||
| route_config_provider_manager.protoTraits(), factory_context); | ||
| config_update_info_->onRdsUpdate(config, ""); |
There was a problem hiding this comment.
q, if the static route config + VHDS is used, does this change the validate_clusters behavior between static and RDS? https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route.proto.html
There was a problem hiding this comment.
Sorry, I think I don't fully understand the question.
The behavior should be as configured, that is, if validate_clusters is true, then the clusters are validated, and if not, they are not validated.
This validate_clusters should define the behavior of the route-configuration, regardless of whether vhds is configured or not.
There was a problem hiding this comment.
yes, it is configurable, I was taking a look at the difference between static and RDS (with AI's help), I meant the default behavior change, by emulating the static to a RDS, will the default change? e.g., static+VHDS will now not validate it if it is unset.
There was a problem hiding this comment.
I thought that RDS+VHDS will only use the configured knob. Can you point me to where the default value is different?
There was a problem hiding this comment.
specifically this line above, and it is hard coded.
And here
envoy/api/envoy/config/route/v3/route.proto
Lines 98 to 101 in e465705
There was a problem hiding this comment.
thanks for the reference.
I believe this was added in #20577 and I don't have a full context to why it was allowed in the first place. @JuniorHsu do you have any context on why there's no cluster validation?
There was a problem hiding this comment.
I have a hunch that my previous patch didn't take virtual host into account. i'm unsure if route level will check eventually tbh.
validate_clusters is trying to prevent sr drop for unknown cluster. We do use virtual host but unsure if we have sr drop -- at least i didn't hear from other's report.
| } | ||
| // If no VHDS is configured, immediately notify the callback that the | ||
| // virtual-host doesn't exist. | ||
| auto current_cb = route_config_updated_cb.lock(); |
There was a problem hiding this comment.
should we put the weak_ptr lock inside the post?
There was a problem hiding this comment.
I think current_cb is used in both the if statement, and inside the dispatched lambda.
Maybe I'm missing something in your comment, so feel free to clarify.
There was a problem hiding this comment.
though there is no vhds, the pointer is upgrade to a shared_ptr, which means the posted lambda can keep the callback alive even if the stream/filter is destroyed before the dispatcher runs it. This seems different from the existing RDS/VHDS pattern:
thread_local_dispatcher.post([route_config_updated_cb] {
if (auto cb = route_config_updated_cb.lock()) {
(*cb)(false);
}
});
There was a problem hiding this comment.
When I looked at the code I tried to avoid a case where the callback is removed while it is posted to the worker thread. Specifically I don't think that posting an element that will then be out of scope is a good idea.
Can you point me to the lines of code that you mentioned in your comment?
I may be able to give a better answer if I'll see what exactly you are comparing against.
There was a problem hiding this comment.
envoy/source/common/router/rds_impl.cc
Lines 153 to 158 in e465705
here are the pointers, and this callback can be from the on_demand filter, and the it was a shard_ptr before and a week_ptr is passed to the provider, and the shared_ptr will be reseted during the on-demand filter destroy.
There was a problem hiding this comment.
Thanks for the reference from the original code which allows me to better understand your question.
RE the approach, I thought that acquired the weak_ptr lock makes it a shared pointer, which will keep the object for a longer time (otherwise there will be a race when the on-demand filter is destroyed).
I think this was observed in #9784 and the fix in #11341 is trying to fix the issue by resetting the pointer on destroy, but I'm not sure if that's the right way to check it.
I'll add a similar test to 'VhdsOnDemandUpdateHttpConnectionCloses' that was added in #11341 and run this with tsan.
I'm willing to change this, I just think that the original fix might not be safe.
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
| - source/common/grpc/google_grpc_creds_impl.cc | ||
| - source/server/drain_manager_impl.cc | ||
| - source/common/router/rds_impl.cc | ||
| - source/common/router/static_route_provider_impl.cc |
There was a problem hiding this comment.
I've added a TODO to clean the usage up, and I'll remove it in a near-future PR (requires conversion of c'tors to create functions that I think should be done in a separate PR to avoid a large refactor as part of this PR).
|
I'm taking a step back here. |
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
|
@yanavlasov PTAL |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements Virtual Host Discovery Service (VHDS) support for static route configurations, allowing them to handle on-demand subscriptions and updates. The implementation introduces a VhdsContext within StaticRouteConfigProviderImpl to manage thread-local configurations and subscription logic. The changes also include new unit tests and the enablement of static route types in VHDS integration tests. Review feedback identifies a thread-safety issue in requestVirtualHostsUpdate where configuration data is accessed across threads and suggests refactoring the constructor into a factory method to improve error handling and avoid exceptions.
| auto alias = VhdsSubscription::domainNameToAlias( | ||
| config_update_info_->protobufConfigurationCast().name(), for_domain); |
There was a problem hiding this comment.
Accessing config_update_info_ on a worker thread is not thread-safe because it is modified on the main thread during onConfigUpdate. Since the route configuration name is static for this provider, it should be captured during construction and stored as a member of VhdsContext to avoid this race condition.
auto alias = VhdsSubscription::domainNameToAlias(route_config_name_, for_domain);There was a problem hiding this comment.
I think this code is safe, since protobufConfigurationCast() is immutable. However it would be much safer if the const correctness would be reflected in the config_update_info_ type by having it be pointing to a const object. This would make it clear that the object can be used across threads. Maybe file an Issue to track this if a quick followup is not possible?
| std::weak_ptr<Http::RouteConfigUpdatedCallback> cb_; | ||
| }; | ||
|
|
||
| RouteConfigUpdatePtr config_update_info_; |
There was a problem hiding this comment.
Alternatively make it const, so it is clear that the pointer itself is immutable.
| route_config_provider_manager_(route_config_provider_manager) {} | ||
| route_config_provider_manager_(route_config_provider_manager) { | ||
| if (config.has_vhds()) { | ||
| vhds_context_ = std::make_unique<VhdsContext>(config, factory_context, *this, |
There was a problem hiding this comment.
I think this can be moved into the initializer list.
| } | ||
| // If no VHDS is configured, immediately notify the callback that the | ||
| // virtual-host doesn't exist. | ||
| thread_local_dispatcher.post([current_cb = route_config_updated_cb] { |
There was a problem hiding this comment.
Does this need to be dispatched via the event loop?
| auto alias = VhdsSubscription::domainNameToAlias( | ||
| config_update_info_->protobufConfigurationCast().name(), for_domain); |
There was a problem hiding this comment.
I think this code is safe, since protobufConfigurationCast() is immutable. However it would be much safer if the const correctness would be reflected in the config_update_info_ type by having it be pointing to a const object. This would make it clear that the object can be used across threads. Maybe file an Issue to track this if a quick followup is not possible?
| std::weak_ptr<Http::RouteConfigUpdatedCallback> cb_; | ||
| }; | ||
|
|
||
| RouteConfigUpdatePtr config_update_info_; |
There was a problem hiding this comment.
Alternatively make it const, so it is clear that the pointer itself is immutable.
| Rds::RouteConfigProviderManager& route_config_provider_manager) | ||
| : factory_context_(factory_context), tls_(factory_context.threadLocal()) { | ||
| // Emulate a config-update information gathering using a dynamic RouteConfigurationReceiver. | ||
| config_update_info_ = std::make_unique<RouteConfigUpdateReceiverImpl>( |
There was a problem hiding this comment.
I think to make thread safety clear here, I suggest making both the pointer and the pointee const. Gemini is correct to point that this is a weak area, even though right now I think it is safe.
|
/wait |
Commit Message: vhds: add support for VHDS in a static route-configuration
Additional Description:
Prior to this work VHDS wasn't working when used in a static RouteConfiguration (that is not fetched via RDS).
This PR enables this feature.
It is inspired by the code from source/common/router/rds_impl.cc.
Risk Level: low - things that were configured and did not work will now work as expected.
Testing: Added unit and integration tests.
Docs Changes: N/A
Release Notes: Added.
Platform Specific Features: N/A