Skip to content

upstream: ensure rounded priority load assigned to healthy priority#4533

Merged
htuch merged 3 commits intoenvoyproxy:masterfrom
snowp:priority-rounding
Oct 2, 2018
Merged

upstream: ensure rounded priority load assigned to healthy priority#4533
htuch merged 3 commits intoenvoyproxy:masterfrom
snowp:priority-rounding

Conversation

@snowp
Copy link
Copy Markdown
Contributor

@snowp snowp commented Sep 26, 2018

Ensures that when priority load is distributed between priorities and
there is a rounding error the remaining value is given to a healthy
priority. Previously, this remainder would always be given to priority
0, which would result in a completely unhealthy priority having non-zero
load.

If P0 was selected due to a rounding error but was otherwise unhealthy, the
request would fail with UH as there are no healthy hosts available in the selected
priority (unless panic mode was triggered).

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: Medium
Testing: Unit test
Docs Changes: n/a
Release Notes: n/a

Ensures that when priority load is distributed between priorities and
there is a rounding error the remaining value is given to a healthy
priority. Previously, this remainder would always be given to priority
0, which would result in a completely unhealthy priority having non-zero
load.

This only really matters when panic threshold is disabled, as
otherwise panic threshold would otherwise kick in at this point.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Copy Markdown
Contributor Author

snowp commented Sep 26, 2018

Noticed this while reading the code, but on further thought this would explain some spurious UH failures we've seen during some our traffic shifts that involve moving hosts from P0 to another priority (leaving P0 empty)

@junr03
Copy link
Copy Markdown
Member

junr03 commented Sep 26, 2018

@dio do you mind taking a look at this one?

@dio
Copy link
Copy Markdown
Member

dio commented Sep 26, 2018

@junr03 sure.

Copy link
Copy Markdown
Member

@dio dio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this. Seems reasonable.

Just one question:

}

if (total_load != 0) {
// Account for rounding errors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this comment? And probably add more with your explanation in this PR desc. I think that will be useful for future readers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I wasn't intentionally trying to remove it. Will add back

Signed-off-by: Snow Pettersen <snowp@squareup.com>
dio
dio previously approved these changes Sep 26, 2018
Copy link
Copy Markdown
Member

@dio dio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks, @snowp!

@htuch htuch self-assigned this Sep 26, 2018
// Account for rounding errors by assigning it to the first healthy priority.
ASSERT(total_load < per_priority_load_.size());
per_priority_load_[0] += total_load;
per_priority_load_[first_healthy_priority] += total_load;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can first_healthy_priority_ ever be -1 here? Should we have an ASSERT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so due to us previously checking that total_load > 0 and skipping this code. That said, I'll add an ASSERT just in case to guard against future code changes.

for (size_t i = 0; i < per_priority_health_.size(); ++i) {
// Now assign as much load as possible to the high priority levels and cease assigning load when
// total_load runs out.
if (first_healthy_priority < 0 && per_priority_health_[i]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: per_priority_health_[i] > 0

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@htuch htuch merged commit c0f6a66 into envoyproxy:master Oct 2, 2018
aa-stripe pushed a commit to aa-stripe/envoy that referenced this pull request Oct 11, 2018
…nvoyproxy#4533)

Ensures that when priority load is distributed between priorities and
there is a rounding error the remaining value is given to a healthy
priority. Previously, this remainder would always be given to priority
0, which would result in a completely unhealthy priority having non-zero
load.

If P0 was selected due to a rounding error but was otherwise unhealthy, the
request would fail with UH as there are no healthy hosts available in the selected
priority (unless panic mode was triggered).

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: Medium
Testing: Unit test
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Aaltan Ahmad <aa@stripe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants