Skip to content

upstream: fix mid-batch thread-aware LB initialization out-of-bounds#43697

Merged
botengyao merged 1 commit into
envoyproxy:mainfrom
botengyao:fix_eds_crash
Mar 2, 2026
Merged

upstream: fix mid-batch thread-aware LB initialization out-of-bounds#43697
botengyao merged 1 commit into
envoyproxy:mainfrom
botengyao:fix_eds_crash

Conversation

@botengyao
Copy link
Copy Markdown
Member

Fixes a crash introduced by #43346.
When a thread-aware load balancer is initialized mid-batch, the per-priority panic tracking vectors have not been resized yet because the batch member update callback hasn't fired. This causes an out-of-bounds read during the LB refresh loop.

  1. Process any dirty priorities to properly size vectors if the load balancer is initialized mid-batch.
  2. Add a bounds check ASSERT to prevent silent out-of-bounds bit vector reads.
  3. Add an initialization regression test to prevent this pattern from breaking in the future.

Commit Message:
Additional Description:
Risk Level: low (already guarded by envoy.reloadable_features.coalesce_lb_rebuilds_on_batch_update).
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>
Copy link
Copy Markdown
Contributor

@wdauchy wdauchy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix

const auto& per_priority_state = (*per_priority_state_vector)[priority];
// Copy panic flag from LoadBalancerBase. It is calculated when there is a change
// in hosts set or hosts' health.
ASSERT(priority < per_priority_panic_.size());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ASSERT (debug only builds) or RELEASE_ASSERT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ASSERT is to catch the issue from tests, and it will have issues if this is not true in release, so this is fine, wdyt?

@nezdolik
Copy link
Copy Markdown
Member

nezdolik commented Mar 2, 2026

apart from one nit comment LGTM!

@botengyao botengyao merged commit 2bcebcc into envoyproxy:main Mar 2, 2026
29 checks passed
@botengyao botengyao deleted the fix_eds_crash branch March 3, 2026 03:24
bmjask pushed a commit to bmjask/envoy that referenced this pull request Mar 14, 2026
…nvoyproxy#43697)

Fixes a crash introduced by envoyproxy#43346.
When a thread-aware load balancer is initialized mid-batch, the
per-priority panic tracking vectors have not been resized yet because
the batch member update callback hasn't fired. This causes an
out-of-bounds read during the LB refresh loop.

1. Process any dirty priorities to properly size vectors if the load
balancer is initialized mid-batch.
2. Add a bounds check `ASSERT` to prevent silent out-of-bounds bit
vector reads.
3. Add an initialization regression test to prevent this pattern from
breaking in the future.

Commit Message:
Additional Description:
Risk Level: low (already guarded by
`envoy.reloadable_features.coalesce_lb_rebuilds_on_batch_update`).
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>
Signed-off-by: bjmask <11672696+bjmask@users.noreply.github.com>
bvandewalle pushed a commit to bvandewalle/envoy that referenced this pull request Mar 17, 2026
…nvoyproxy#43697)

Fixes a crash introduced by envoyproxy#43346. 
When a thread-aware load balancer is initialized mid-batch, the
per-priority panic tracking vectors have not been resized yet because
the batch member update callback hasn't fired. This causes an
out-of-bounds read during the LB refresh loop.

1. Process any dirty priorities to properly size vectors if the load
balancer is initialized mid-batch.
2. Add a bounds check `ASSERT` to prevent silent out-of-bounds bit
vector reads.
3. Add an initialization regression test to prevent this pattern from
breaking in the future.

Commit Message:
Additional Description:
Risk Level: low (already guarded by
`envoy.reloadable_features.coalesce_lb_rebuilds_on_batch_update`).
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>
fishcakez pushed a commit to fishcakez/envoy that referenced this pull request Mar 25, 2026
…nvoyproxy#43697)

Fixes a crash introduced by envoyproxy#43346. 
When a thread-aware load balancer is initialized mid-batch, the
per-priority panic tracking vectors have not been resized yet because
the batch member update callback hasn't fired. This causes an
out-of-bounds read during the LB refresh loop.

1. Process any dirty priorities to properly size vectors if the load
balancer is initialized mid-batch.
2. Add a bounds check `ASSERT` to prevent silent out-of-bounds bit
vector reads.
3. Add an initialization regression test to prevent this pattern from
breaking in the future.

Commit Message:
Additional Description:
Risk Level: low (already guarded by
`envoy.reloadable_features.coalesce_lb_rebuilds_on_batch_update`).
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>
henrymwang pushed a commit to DataDog/envoy that referenced this pull request Apr 13, 2026
…nvoyproxy#43697)

Fixes a crash introduced by envoyproxy#43346.
When a thread-aware load balancer is initialized mid-batch, the
per-priority panic tracking vectors have not been resized yet because
the batch member update callback hasn't fired. This causes an
out-of-bounds read during the LB refresh loop.

1. Process any dirty priorities to properly size vectors if the load
balancer is initialized mid-batch.
2. Add a bounds check `ASSERT` to prevent silent out-of-bounds bit
vector reads.
3. Add an initialization regression test to prevent this pattern from
breaking in the future.

Commit Message:
Additional Description:
Risk Level: low (already guarded by
`envoy.reloadable_features.coalesce_lb_rebuilds_on_batch_update`).
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>
krinkinmu pushed a commit to grnmeira/envoy that referenced this pull request Apr 20, 2026
…nvoyproxy#43697)

Fixes a crash introduced by envoyproxy#43346. 
When a thread-aware load balancer is initialized mid-batch, the
per-priority panic tracking vectors have not been resized yet because
the batch member update callback hasn't fired. This causes an
out-of-bounds read during the LB refresh loop.

1. Process any dirty priorities to properly size vectors if the load
balancer is initialized mid-batch.
2. Add a bounds check `ASSERT` to prevent silent out-of-bounds bit
vector reads.
3. Add an initialization regression test to prevent this pattern from
breaking in the future.

Commit Message:
Additional Description:
Risk Level: low (already guarded by
`envoy.reloadable_features.coalesce_lb_rebuilds_on_batch_update`).
Testing:
Docs Changes:
Release Notes:

Signed-off-by: Boteng Yao <boteng@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants