cluster: copy active cluster hosts during warming#6021
cluster: copy active cluster hosts during warming#6021ramaraochavali wants to merge 16 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
| host_set->hostsPerLocality().clone(); | ||
| cluster_entry.cluster_->prioritySet().updateHosts( | ||
| priority, HostSetImpl::partitionHosts(hosts_copy, hosts_per_locality_copy), | ||
| host_set->localityWeights(), {}, {}, absl::nullopt); |
There was a problem hiding this comment.
I have passed absl::nullopt for over provisioning factor with the assumption that, it should use the same over provisioning factor that active cluster hosts are built with. If over provisioning factor is changed, we should expect the management server to send EDS updates? or should I pass what ever is there is warming cluster config?
There was a problem hiding this comment.
How would the warming cluster know what overprovisioning value the active cluster has? I think in this case you're creating a new HostSet with the default overprovisioning factor. In any case, could we add test coverage to verify that we're preserving the factor?
There was a problem hiding this comment.
The active_host_set has the overProvisioningFactor preserved. We can pass it. I will change it like that.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
/retest |
|
🔨 rebuilding |
|
/retest |
|
🐴 hold your horses - no failures detected, yet. |
|
@snowp for first pass, I figure Snow is our resident expert here. |
snowp
left a comment
There was a problem hiding this comment.
Thanks for working on this!
| const auto active_it = active_clusters_.find(cluster_name); | ||
| if (active_it != active_clusters_.end()) { | ||
| auto& active_cluster_entry = *active_it->second; | ||
| if (active_cluster_entry.cluster_->prioritySet().hostSetsPerPriority().size() != 0 && |
There was a problem hiding this comment.
This check seems pretty complicated - is it just a roundabout way of checking whether the active cluster is empty and that the warming one is not? Should we just add a empty() call to PrioritySet instead?
Also, how does this check work with the fact that we always create a P=0 HostSet in the ClusterImplBase ctor? Are there actually any scenarios in which we'll hit this code and the active cluster has no HostSets?
There was a problem hiding this comment.
is it just a roundabout way of checking whether the active cluster is empty and that the warming one is not?
It is the other way around. It checks if active cluster has some hosts but not the warming cluster. I think it makes sense to add empty method to PrioritySet. Let me try that.
Also, how does this check work with the fact that we always create a P=0
HostSetin theClusterImplBasector? Are there actually any scenarios in which we'll hit this code and the active cluster has noHostSets?
Yes. For active clusters it does not copy any thing if there are no hosts.
| EXPECT_EQ(cluster2->info_, cluster_manager_->get("fake_cluster")->info()); | ||
| EXPECT_EQ(1UL, cluster_manager_->clusters().size()); | ||
|
|
||
| // Validate that the host updates are pushed to tls clusters. |
There was a problem hiding this comment.
Wouldn't these hosts be present on the TLS already for the existing cluster? In fact, I think we'd want to avoid posting to the TLS cluster since the host sets haven't actually changed to avoid the extra work of rebuilding all the TLS state.
I think there are some helpers set up in this file to verify when cluster updates are posted, those might be useful here to make sure we're calling it when
There was a problem hiding this comment.
I was not sure what else state is there on TLS that we should be passed. Agee that hosts would be there. But I could not find a way to check and control postThreadLocalClusterUpdate call? LMK if there is an easy way to do it - Since this is rare case, may be it is OK To push TLS state if it is complicated to control?
There was a problem hiding this comment.
If you use the MockedUpdatedClusterManagerImpl defined in cluster_manager_impl_test then you can add EXPECT_CALL on local_cluster_update_ to verify when the TLS posting happens.
I think first step is to verify wether we're seeing an additional TLS update by checking it in test, then we can discuss separately whether we want to add additional logic to prevent duplicate updates (if that is even happening).
There was a problem hiding this comment.
I am sure that is happening because there is no checks to prevent that, I will add a test to confirm the same.
| host_set->hostsPerLocality().clone(); | ||
| cluster_entry.cluster_->prioritySet().updateHosts( | ||
| priority, HostSetImpl::partitionHosts(hosts_copy, hosts_per_locality_copy), | ||
| host_set->localityWeights(), {}, {}, absl::nullopt); |
There was a problem hiding this comment.
How would the warming cluster know what overprovisioning value the active cluster has? I think in this case you're creating a new HostSet with the default overprovisioning factor. In any case, could we add test coverage to verify that we're preserving the factor?
| const auto& host_sets = | ||
| active_cluster_entry.cluster_->prioritySet().hostSetsPerPriority(); | ||
| for (size_t priority = 0; priority < host_sets.size(); ++priority) { | ||
| const auto& host_set = host_sets[priority]; |
There was a problem hiding this comment.
I think this would be easier to read if you prefixed warming or active in front of these variables, e.g. active_host_sets, it's kinda hard to keep track of what is what at the moment
| active_cluster_entry.cluster_->prioritySet().hostSetsPerPriority(); | ||
| for (size_t priority = 0; priority < host_sets.size(); ++priority) { | ||
| const auto& host_set = host_sets[priority]; | ||
| HostVectorConstSharedPtr hosts_copy(new HostVector(host_set->hosts())); |
There was a problem hiding this comment.
Do we need to copy this here? I think partitionHosts already makes copies?
There was a problem hiding this comment.
I tried, but did not know to do it - I think it may be related to this comment
. LMK if you have better idea on how to do it?There was a problem hiding this comment.
Ah, the issue is that partitionHosts takes a vector of const pointers, while hosts() returns non-const pointers. In that case, mind adding a TODO here as well? I think it would make sense to look into optimizing away these copies at a later time.
There was a problem hiding this comment.
added the TODO
| const auto& host_set = host_sets[priority]; | ||
| HostVectorConstSharedPtr hosts_copy(new HostVector(host_set->hosts())); | ||
| HostsPerLocalityConstSharedPtr hosts_per_locality_copy = | ||
| host_set->hostsPerLocality().clone(); |
There was a problem hiding this comment.
Same question, do we actually need to clone here?
There was a problem hiding this comment.
same reason as above.
| // Otherwise, applyUpdates() will fire with a dangling cluster reference. | ||
| updates_map_.erase(cluster_name); | ||
|
|
||
| // If the active cluster has priority sets and the warming cluster does not have them, it |
There was a problem hiding this comment.
There's only one priority set per cluster, so this wording is a bit strange
There was a problem hiding this comment.
Yeah, I meant hosts - will change
| updates_map_.erase(cluster_name); | ||
|
|
||
| // If the active cluster has priority sets and the warming cluster does not have them, it | ||
| // means that onConfigUpdate is triggered for EDS type of cluster is triggered with no |
There was a problem hiding this comment.
Maybe means that onConfigUpdate was triggered by an EDS update that had no references to this cluster?
| // If the active cluster has priority sets and the warming cluster does not have them, it | ||
| // means that onConfigUpdate is triggered for EDS type of cluster is triggered with no | ||
| // reference to this cluster. In such cases, copy the active cluster priority set to the | ||
| // warming cluster otherwise the active cluster hosts will be cleared after warming is |
There was a problem hiding this comment.
nit: I'd rephrase this to say something like to prevent the hosts from being cleared after warming
| // completed. See https://github.com/envoyproxy/envoy/issues/5168 for more context. | ||
| const auto active_it = active_clusters_.find(cluster_name); | ||
| if (active_it != active_clusters_.end()) { | ||
| auto& active_cluster_entry = *active_it->second; |
There was a problem hiding this comment.
nit: I think this can be const?
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
/retest |
|
🔨 rebuilding |
|
/retest |
|
🔨 rebuilding |
|
/retest |
|
🔨 rebuilding |
|
/retest |
|
🔨 rebuilding |
| EXPECT_CALL(*cluster2, initializePhase()).Times(0); | ||
|
|
||
| // Validate that TLS updates are triggered correctly after warming. | ||
| EXPECT_CALL(local_cluster_update_, post(_, _, _)) |
There was a problem hiding this comment.
This actually tells me that we're not triggering a TLS update when we call updateHosts in this PR, since we're only seeing this call once. This call is coming from when we explicitly call postThreadLocalClusterUpdate in onClusterInit. I believe we're not incurring the extra call because the call to updateHosts comes before we've wired up the callback that triggers TLS updates.
There was a problem hiding this comment.
Yes. That is correct. updateHosts does not trigger a TLS update. Do you think we are good with this PR ?
There was a problem hiding this comment.
Yeah, thanks for taking the time to verify this behavior.
|
/retest |
|
🔨 rebuilding |
snowp
left a comment
There was a problem hiding this comment.
Thanks for iterating, I think this makes sense
| EXPECT_CALL(*cluster2, initializePhase()).Times(0); | ||
|
|
||
| // Validate that TLS updates are triggered correctly after warming. | ||
| EXPECT_CALL(local_cluster_update_, post(_, _, _)) |
There was a problem hiding this comment.
Yeah, thanks for taking the time to verify this behavior.
|
|
||
| bool empty() const override { | ||
| for (auto const& host_set : host_sets_) { | ||
| if (host_set->hosts().size() > 0) { |
| // Otherwise, applyUpdates() will fire with a dangling cluster reference. | ||
| updates_map_.erase(cluster_name); | ||
|
|
||
| // If the active cluster has hosts in priority set and the warming cluster does not have them, |
There was a problem hiding this comment.
I didn't follow this conversation, so this was already probably discussed, but isn't this dangerous? Isn't there a case in which this can lead to stale hosts that the management server legitimately does not want the Envoy to have? (Something about this seems off to me).
There was a problem hiding this comment.
Yeah I think you're right: There's nothing preventing us from doing this copying for a legitimate zero host EDS update following the CDS update, so I think we're missing a piece here before we can merge this. WDYT @ramaraochavali?
There was a problem hiding this comment.
That is a fair point. My intention was to handle this case
https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/eds.cc#L48 - where onConfigUpdate was called with out this cluster reference. I thought I read it in the xds_protocol that when empty hosts are sent we should work with existing data. Reading it again "When a requested resource is missing in a RDS or EDS update, Envoy will retain the last known value for this resource." - I misinterpreted this to empty hosts, it is essentially talking about the case I am trying to fix.
I will add an additional condition to trigger this only when onConfigUpdate is called from above flow.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@mattklein123 @snowp handled the case of zero host EDS update and added a test to verify the behaviour as well. Should be good now, PTAL. |
snowp
left a comment
There was a problem hiding this comment.
Thanks, I think this makes it more robust. I also think you probably want to up the risk level of this change to a medium at the very least, considering we almost subtly broke EDS updating
| Outlier::Detector* outlierDetector() override { return outlier_detector_.get(); } | ||
| const Outlier::Detector* outlierDetector() const override { return outlier_detector_.get(); } | ||
| void initialize(std::function<void()> callback) override; | ||
| bool isBeingInitializedByEmptyConfigUpdate() const override { return false; } |
There was a problem hiding this comment.
maybe initializedByEmptyConfig? Since this will continue to return true after initialization
There was a problem hiding this comment.
makes sense. Will change
| // Otherwise, applyUpdates() will fire with a dangling cluster reference. | ||
| updates_map_.erase(cluster_name); | ||
|
|
||
| // If onConfigUpdate was triggered by an EDS update that had no references to this |
There was a problem hiding this comment.
I think I'm still missing something here. Why isn't the management server responding with the hosts that the Envoy should have? Can you potentially add more comments or we can discuss in this comment thread until we are all on the same page?
There was a problem hiding this comment.
In reading through #5168 I'm wondering why this isn't a management server issue. Can you just not have your management server enter the LB pool until it's fully ready to serve requests?
There was a problem hiding this comment.
I think the best way to think this through is with a concrete example. Let's imaging we're switching from port 80 to 443 (EDS) and HTTP to HTTPS to backend (CDS).
@mattklein123 are you concerned that we never see the 443 EDS host update? Or that we temporarily won't see it during config update?
There was a problem hiding this comment.
An example would be great. My concern here is that it's not clear to me that this is an Envoy issue vs. a management server issue. I.e., why is the management sever accepting connections if it can't actually serve them? Is there a chance that this workaround is going to change legitimate behavior such as the management server wanting to remove all hosts when a cluster has been refreshed, etc.
I think I would mainly just like to better understand why we think this is the right fix to do here.
There was a problem hiding this comment.
My thinking is that the management server always has the capability to send an empty host update (or the 443 hosts in the above example). There may be a period of inconsistency, but arguably that's inevitable when not doing make-before-break consistency.
There was a problem hiding this comment.
I think I'm probably missing something fundamental, so apologies, but why would the the management server send an empty update if there are hosts?
There was a problem hiding this comment.
This is the example you gave in:
Is there a chance that this workaround is going to change legitimate behavior such as the management server wanting to remove all hosts when a cluster has been refreshed, etc.
There was a problem hiding this comment.
@mattklein123 just to be clear, management server is not sending empty update for this cluster if there are hosts. It sent ClusterLoadAssignment response for some other cluster (there is no reference to this cluster in the response) and we call onConfigUpdate for this cluster from here with empty because we did not find any resources for this cluster https://github.com/envoyproxy/envoy/blob/master/source/common/config/grpc_mux_impl.cc#L171
Normally this is fine because if it is initialized we just increment the stat and onPreInit would do nothing. But if it is in the initializing(warming) state, it is clearing the clusters.
So I think it is definitely an Envoy issue and it is not honouring this statement specifically in the protocol doc ""When a requested resource is missing in a RDS or EDS update, Envoy will retain the last known value for this resource."
There was a problem hiding this comment.
Added more docs. See if that makes sense.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
/retest |
|
🔨 rebuilding |
| // Otherwise, applyUpdates() will fire with a dangling cluster reference. | ||
| updates_map_.erase(cluster_name); | ||
|
|
||
| // If management server sends a EDS response, for any other cluster grpc_mux_impl calls |
There was a problem hiding this comment.
@ramaraochavali @snowp @htuch you all are clearly fine with this, so I'm not understanding something, but why does an EDS response for any other cluster cause an empty update for this cluster? Isn't that a bug?
There was a problem hiding this comment.
OK I think I understand, we are looking at this text in the XDS protocol docs: "When a requested resource is missing in a RDS or EDS update, Envoy will retain the last known value for this resource." Right?
I guess this seems OK, though I do wonder if this clause is adding extra complexity that we don't really need, especially with incremental coming. Meaning, it seems like your management server could not accept new connections until it's ready to serve them and we can avoid all of this extra code, potential for bugs, behavior differences, etc. That would be my preference.
If we do decide to keep this, can we spell out the clause we are relying on here in the docs and link to them?
There was a problem hiding this comment.
we are looking at this text in the XDS protocol docs: "When a requested resource is missing in a RDS or EDS update, Envoy will retain the last known value for this resource." Right?
Yes.
If we do decide to keep this, can we spell out the clause we are relying on here in the docs and link to them?
I can add the clause in the doc, but my preference would be to fix this on Envoy side - it should not clear hosts when it gets EDS response for some other cluster. For EDS/RDS we are not mandating to send all responses every time (which is the right thing to do). Things may change with incremental but we can deal with it when it comes?
There was a problem hiding this comment.
@ramaraochavali we just discussed this issue briefly on the community call. I think we might need to have a meeting on this. Given that chat, I'm still pretty uncomfortable with this change for 2 reasons:
- It's not clear to me that's correct to copy the hosts from an old cluster to a new cluster.
- It adds a bunch of complexity that might lead to hard to understand edge cases.
- It really seems to me like you could fix this in your management server: don't accept connections to a management server (do not report healthy to your LB) until the management server can fully serve connections.
I know @htuch as some thoughts here so will let him weigh in and then we can maybe do a meeting if needed?
There was a problem hiding this comment.
Maybe one way to simplify this discussion is to come up with a bunch of user stories, e.g. "As an Envoy user, I want to upgrade a service from HTTP to HTTPS.. the recommended steps are ...".
There was a problem hiding this comment.
The management server can still do an EDS update immediately after CDS, even if we reuse existing EDS resources, we just get a period of potential inconsistency.
I think it comes down to whether a management server can guarantee whether an EDS update following a CDS update truly reflect the new CDS state. In some setups this is true, but in others it isn't.
I agree with this and it seems potential inconsistency can not be avoided in all cases.
I think we need to think about a short term solution and longer term solutions like immutable resources etc
For short term we have couple of solutions
- Reuse existing EDS resources till management server provides a new set of resources matching the CDS update (Whether this CDS update requires a new set of EDS resources is another question - which we are assuming right now that all CDS updates require EDS response which is kind of looks not correct)
- Do not finish warming till a named response comes for updated warming case
Do we have any other ideas?
Longer term, for sure we should come up with bunch of user stories and think about solutions like immutable resources etc as @htuch mentioned.
There was a problem hiding this comment.
@ramaraochavali take a look at a convo I had with @htuch in Envoy slack: https://envoyproxy.slack.com/archives/CEFDKQ3RQ/p1551391835017700
I'm pretty strongly of the opinion that we should not finish init/warming until we explicitly receive a named response for an EDS fetch. I think this is the most clear solution and IMO the behavior that most people would expect.
@htuch brings up the valid point that this may not work in all cases if the hosts have changed, but IMO this is very rare, and we should offer guidance in the XDS docs that this edge case should be handled by a cluster rename and correct sequencing.
There was a problem hiding this comment.
@mattklein123 @htuch @snowp Created this doc https://docs.google.com/document/d/1Ca4YX9XNnV9rjTR7U0_xyCxUDKfoSun8pWSUNO9N-tY/edit?usp=sharing so that we can discuss there.
There was a problem hiding this comment.
OK. So since we got consensus on finishing warming only on named responses, I am going to close this PR and open another PR with a fix.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Description: Copies Active Cluster hosts during warming complete phase, if warming complete is triggered by empty update.
Risk Level: Medium
Testing: Added a test case to cover this case.
Docs Changes: N/A
Release Notes: N/A
Fixes #5168