On-demand VHDS implementation#8617
On-demand VHDS implementation#8617htuch merged 73 commits intoenvoyproxy:masterfrom dmitri-d:vhds-on-demand-restarted
Conversation
|
I decided to open a new PR instead of re-opening the original one (can be found here: https://github.com/envoyproxy/envoy/pull/6552/files). |
|
Drive by: please merge #7415 into this PR so we make sure we have docs for this feature. Thank you! |
|
|
fredlas
left a comment
There was a problem hiding this comment.
Partial review because github has started not accepting my comments? Let's see what's going on with that...
There was a problem hiding this comment.
This if seems a little suspect. It seems like it is checking for a not-really-error but not-really-right condition. The sort of thing where things can muddle along mostly working, and it's hard to track down what's going wrong. Going by the names of the functions involved, I think it might be more safe and solid to, rather than doing this if in here, instead guard all uses of requestVirtualHostsUpdate with it. I think that would also let you drop the bool return type. (If all of that is actually possible).
There was a problem hiding this comment.
I pulled the check for whether requestVirtualHostsUpdate is available to the very top (see extensions/filters/http/on_demand/on_demand_update.
fredlas
left a comment
There was a problem hiding this comment.
ok, now I am done with this round of review.
|
mattklein123
left a comment
There was a problem hiding this comment.
I haven't actually reviewed this change but I'm marking this requested changes to make sure we get VHDS fully documented for whatever exists today in the code base. This should include:
- Merging #7415 into this PR as proper RST docs with arch overview, etc.
- Getting VHDS up to parity with the fixes that I did in #8678.
Thank you!
|
LGTM |
|
|
@lambdai: the PR I was talking about during the last xDS call, could you take a look when you get a chance? |
|
There was a problem hiding this comment.
Thanks @dmitri-d. This is epic. It's a pretty big PR, so I've started my review with a few comments. I think that this is going to need a fair bit of review to get through. Also, as a general principle, PRs to the Envoy code base should either be large and mechanical or small and complicated (or even simple :) ). Please consider what can be done to simplify or reduce the amount of change in this PR.
My main initial set of questions are around the additional plumbing added for aliases. I find it somewhat surprising, I had envisaged aliases being far simpler to handle.
Tagging @lambdai and @stevenzzzz for implications of the update callbacks to the ownership and lifecycle of route config provider resources.
/wait
There was a problem hiding this comment.
Is it reasonable to promote alias req/resp to first class in mux and subscription?
There was a problem hiding this comment.
Removed the method (pls. see the conversation above re: delta_subscription_state's requested_aliases_ field).
lambdai
left a comment
There was a problem hiding this comment.
I need to digest this PR with design doc since ondemand involves close interaction with worker threads and main thread. Is the design doc named "on demand RDS"? I need to make sure I am reading the correct doc. Thanks!
There was a problem hiding this comment.
I think requestRouteConfigUpdate() will be PURE when the PR is merged
There was a problem hiding this comment.
There are way too many copies of aliases. IMHO this one should be shared_ptr and the below runOnAllThreads will capture the shared_ptr
There was a problem hiding this comment.
a counter-point: I don't think it's ever going to be large number of aliases in the set at one time (I expect a couple at most). Only two copies are being made: one from the original protobuf, the other one is put in the lambda context. Do you think shared_ptr is still worth it considering small size of aliases being copied and the overhead of accessing a shared_ptr?
There was a problem hiding this comment.
The const give me the hint that the set should be a shared_ptr among all threads
|
@dmitri-d could you pls link the issue or the design doc ? |
stevenzzzz
left a comment
There was a problem hiding this comment.
It'd be nice if we can split this PR into a control plane side PR + a data-plane side PR.
Please link a design doc or add some more doc about how on-demand rds works.
In my previous understanding, on-demand xDS is a xDS-horizontal framework change.
There was a problem hiding this comment.
shall we also check if the host is already in the route-table and continue from there. IIUC If RDS request doesn't return, this essentially hangs processing of every requests?
There was a problem hiding this comment.
The check is made inside the method: https://github.com/envoyproxy/envoy/pull/8617/files#diff-53e919b583d3290e6fddf324e7e69885R13
There was a problem hiding this comment.
I have some life-circle concern here, a stream may not live as long as a subscription, in which case this callback ends in unknown behavior.
There was a problem hiding this comment.
Do you mean the gRPC stream carrying the subscription? If so, it is a safe assumption; the subscription is a shared owner of a GrpcMux, which is the owner of the GrpcStream.
There was a problem hiding this comment.
no, I mean the per HTTP request ActiveStream.
There was a problem hiding this comment.
I pause the filter chain until a response arrives and the callback resumes its execution. Wouldn't that mean that the ActiveConnection sticks around too?
There was a problem hiding this comment.
No, I am saying a subscription may outlive an activeConnection. For example, with lambdai@'s recent change, a RDS subscription will outlive a listener. when that happens, the update_on_demand_callback.cb_() end in undefined behavior.
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
…started Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
ping @lambdai: fixed RouteConfigUpdatedCallback lifecycle issues that you pointed out |
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
| void RouteConfigUpdateReceiverImpl::collectResourceIdsInUpdate( | ||
| const Protobuf::RepeatedPtrField<envoy::service::discovery::v3alpha::Resource>& | ||
| added_resources) { | ||
| resource_ids_in_last_update_.clear(); |
There was a problem hiding this comment.
I'm not sure how actionable this is, but not a huge fan of building these mutable stateful objects, would have preferred to just deliver a LastUpdateInfo object along with the update to the receivers and then discarded. But, it's probably not a deal breaker.
There was a problem hiding this comment.
Agreed, not optimal. There's a TODO to split the state here: https://github.com/envoyproxy/envoy/pull/8617/files/8e7b8f2f43bf00042d538501a7a076da76a42c6e#diff-dcbe1eb34bb1b29e349d7c87c0f20337R61
|
LGTM modulo nits (and I'd like to verify the coverage report, I can't see it for some reason on CircleCI). Thanks for the fantastic work @dmitri-d! |
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
lambdai
left a comment
There was a problem hiding this comment.
The DS code path LGTM
I am not very familiar with the http filter operation.
Throw the ball to @htuch @stevenzzzz
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
|
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
htuch
left a comment
There was a problem hiding this comment.
Fantastic, this is really a huge contribution to xDS, our first on-demand API and a big enabler of scalability work. Thanks for all the work on review iterations as well.
|
Thank you everyone for reviews and feedback! |
Signed-off-by: Dmitri Dolguikh ddolguik@redhat.com
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description: Implements on-demand resolution of VirtualHosts via VHDS
Risk Level:
Testing:
Docs Changes:
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]