upstream: ensure rounded priority load assigned to healthy priority#4533
upstream: ensure rounded priority load assigned to healthy priority#4533htuch merged 3 commits intoenvoyproxy:masterfrom
Conversation
Ensures that when priority load is distributed between priorities and there is a rounding error the remaining value is given to a healthy priority. Previously, this remainder would always be given to priority 0, which would result in a completely unhealthy priority having non-zero load. This only really matters when panic threshold is disabled, as otherwise panic threshold would otherwise kick in at this point. Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Noticed this while reading the code, but on further thought this would explain some spurious UH failures we've seen during some our traffic shifts that involve moving hosts from P0 to another priority (leaving P0 empty) |
|
@dio do you mind taking a look at this one? |
|
@junr03 sure. |
dio
left a comment
There was a problem hiding this comment.
Thanks for taking this. Seems reasonable.
Just one question:
| } | ||
|
|
||
| if (total_load != 0) { | ||
| // Account for rounding errors. |
There was a problem hiding this comment.
Can we keep this comment? And probably add more with your explanation in this PR desc. I think that will be useful for future readers.
There was a problem hiding this comment.
Ah yeah I wasn't intentionally trying to remove it. Will add back
Signed-off-by: Snow Pettersen <snowp@squareup.com>
| // Account for rounding errors by assigning it to the first healthy priority. | ||
| ASSERT(total_load < per_priority_load_.size()); | ||
| per_priority_load_[0] += total_load; | ||
| per_priority_load_[first_healthy_priority] += total_load; |
There was a problem hiding this comment.
Can first_healthy_priority_ ever be -1 here? Should we have an ASSERT?
There was a problem hiding this comment.
I don't think so due to us previously checking that total_load > 0 and skipping this code. That said, I'll add an ASSERT just in case to guard against future code changes.
| for (size_t i = 0; i < per_priority_health_.size(); ++i) { | ||
| // Now assign as much load as possible to the high priority levels and cease assigning load when | ||
| // total_load runs out. | ||
| if (first_healthy_priority < 0 && per_priority_health_[i]) { |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
…nvoyproxy#4533) Ensures that when priority load is distributed between priorities and there is a rounding error the remaining value is given to a healthy priority. Previously, this remainder would always be given to priority 0, which would result in a completely unhealthy priority having non-zero load. If P0 was selected due to a rounding error but was otherwise unhealthy, the request would fail with UH as there are no healthy hosts available in the selected priority (unless panic mode was triggered). Signed-off-by: Snow Pettersen snowp@squareup.com Risk Level: Medium Testing: Unit test Docs Changes: n/a Release Notes: n/a Signed-off-by: Snow Pettersen <snowp@squareup.com> Signed-off-by: Aaltan Ahmad <aa@stripe.com>
Ensures that when priority load is distributed between priorities and
there is a rounding error the remaining value is given to a healthy
priority. Previously, this remainder would always be given to priority
0, which would result in a completely unhealthy priority having non-zero
load.
If P0 was selected due to a rounding error but was otherwise unhealthy, the
request would fail with UH as there are no healthy hosts available in the selected
priority (unless panic mode was triggered).
Signed-off-by: Snow Pettersen snowp@squareup.com
Risk Level: Medium
Testing: Unit test
Docs Changes: n/a
Release Notes: n/a