Skip to content

[bp/1.35] Flakes stack#44685

Closed
Copilot wants to merge 8 commits intorelease/v1.35from
copilot/backport-flake-fix-to-release-v1-35
Closed

[bp/1.35] Flakes stack#44685
Copilot wants to merge 8 commits intorelease/v1.35from
copilot/backport-flake-fix-to-release-v1-35

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

Backport of the flake-fix stack from #44644 ([bp/1.37] Flakes stack) to release/v1.35. 7 of the 8 commits apply; #43711 is intentionally skipped (see below).

Cherry-picked commits

  1. test: Fix ws integration test flake #44508test: Fix ws integration test flake — drops cpu:3/large size tags, adds timeout-driven websocket upgrade tests
  2. test/ws: Follow up fixes from #44508 #44518test/ws: Follow up fixes from #44508
  3. test/ws: Increase sharding further #44529test/ws: Increase sharding further — still flaking under MSAN
  4. test/idle_timeout: Fix flake #44584test/idle_timeout: Fix flake — scale RequestTimeoutMs by TIMEOUT_FACTOR
  5. quic: Remove flaky ASSERT in ShouldCreateOutgoingBidirectionalStream #44500quic: Remove flaky ASSERT in ShouldCreateOutgoingBidirectionalStream — ASSERT was incorrect; parent can legitimately return false (fixes MSAN flakes in buffer_accounting_integration_test)
  6. test: Attempt to fix flake in grpc transcoder test #44510test: Attempt to fix flake in grpc transcoder test — split combined test
  7. lrs: proper shutdown of the LRS stream #43083lrs: proper shutdown of the LRS stream — adds ~LoadStatsReporter destructor to disable timers and close gRPC stream before destruction, fixing shutdown race with ADS

Skipped

#43711 (dns test: improve tests by teardown the server and upstream) — dns_filter_integration_test.cc on release/v1.35 uses the cares resolver, lacks OsSysCallsWithMockedDns and setupResponseParser-based fixture; the teardown patch does not apply cleanly and the flake context differs.

Conflict resolutions

Related

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh gh auth status (http block)
  • https://api.github.com/repos/envoyproxy/envoy/pulls/44685
    • Triggering command: /usr/bin/curl curl -s -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"title": "[bp/1.35] Flakes stack", "draft": false} (http block)
    • Triggering command: /usr/bin/curl curl -v -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json REDACTED -d {"title": "[bp/1.35] Flakes stack", "draft": false} (http block)
    • Triggering command: /usr/bin/curl curl -s -o /tmp/pr_update_result.json -w %{http_code} -X PATCH -H Authorization: token ****** -H Accept: application/vnd.github.v3+json -H Content-Type: application/json REDACTED -d {"title": "[bp/1.35] Flakes stack"} (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Backport the flake-fix stack from PR #44644 (which targeted release/v1.37) to release/v1.35.

Cherry-pick the following 7 upstream commits in order from main onto release/v1.35 (note: one of the 8 commits from the 1.37 backport is intentionally skipped for 1.35 — see below):

  1. 1b7af978b8819e08d2a804556d2ca52d77a0169c — test: Fix ws integration test flake (test: Fix ws integration test flake #44508)
  2. 095e13d2740727c31e781f7abc4fb67ab1dd04ac — test/ws: Follow up fixes from test: Fix ws integration test flake #44508 (test/ws: Follow up fixes from #44508 #44518)
  3. (the squash-merge commit for test/ws: Increase sharding further #44529) — test/ws: Increase sharding further (test/ws: Increase sharding further #44529)
  4. 080522bf396dd344fe28dfc98c29eea4e7fab012 — test/idle_timeout: Fix flake (test/idle_timeout: Fix flake #44584)
  5. (the squash-merge commit for quic: Remove flaky ASSERT in ShouldCreateOutgoingBidirectionalStream #44500) — quic: Remove flaky ASSERT in ShouldCreateOutgoingBidirectionalStream (quic: Remove flaky ASSERT in ShouldCreateOutgoingBidirectionalStream #44500)
  6. (the squash-merge commit for test: Attempt to fix flake in grpc transcoder test #44510) — test: Attempt to fix flake in grpc transcoder test (test: Attempt to fix flake in grpc transcoder test #44510)
  7. (the squash-merge commit for lrs: proper shutdown of the LRS stream #43083) — lrs: proper shutdown of the LRS stream (lrs: proper shutdown of the LRS stream #43083)

Skip: #43711 (dns test: improve tests by teardown the server and upstream). The test/extensions/filters/udp/dns_filter/dns_filter_integration_test.cc on release/v1.35 is structurally different (uses the cares resolver instead of getaddrinfo, lacks the OsSysCallsWithMockedDns mock, lacks setupResponseParser-based fixture). The teardown patch does not apply cleanly and the original flake context differs — skip rather than hand-port.

You can find the exact merge SHAs for the squash commits by looking at the commits in phlax:bp-1.37-flakes (the source branch of #44644). The list is available via /repos/envoyproxy/envoy/pulls/44644/commits.

Verification done already against release/v1.35:

Workflow:

  1. Create a branch bp-1.35-flakes from release/v1.35.
  2. Cherry-pick the 7 commits in the order listed (skip dns test: improve tests by teardown the server and upstream #43711).
  3. Resolve any trivial conflicts (changelog entries — for 1.35 the changelog lives in the 1.35 dev-cycle yaml; if the LRS bug-fix entry doesn't apply cleanly, drop or relocate the entry as appropriate for 1.35's changelog layout).
  4. Push and open a PR titled [bp/1.35] Flakes stack against base release/v1.35.
  5. Add the standard backport label used on recent [bp/1.35] PRs (check recent merged 1.35 backports for reference).
  6. PR body should:

Do NOT open any issues. Just the PR.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: this backported a load of flakes to 1.37 - wondering if all are correct/needed/fix somethign etc? for other supported branches - 1.35 and 1.36 (release/v{x})

Assistant: ## Summary of [bp/1.37] Flakes stack (#44644)

The PR backported 8 upstream commits to release/v1.37. Here they are with applicability to release/v1.36 and release/v1.35:

# Upstream PR Change Risk Fixes / motivation
1 #44508 test: Fix ws integration test flake — fixes 200ms timeout race + drops cpu:3/large size in test/integration/BUILD test-only reduce flakes in websocket_integration_test
2 #44518 test/ws: Follow up fixes from #44508 test-only follow up to #44508
3 #44529 test/ws: Increase sharding further test-only still flaking under MSAN
4 #44584 test/idle_timeout: Fix flake — scale RequestTimeoutMs by TIMEOUT_FACTOR test-only fixes #26979 (long-standing)
5 #44500 quic: Remove flaky ASSERT in ShouldCreateOutgoingBidirectionalStream source, low risk (removes an ASSERT) fixes #41526 — flakes in buffer_accounting_integration_test MSAN
6 [#44510](#44...

This pull request was created from Copilot chat.

@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #44685 was opened by Copilot.

see: more, trace.

phlax and others added 7 commits April 27, 2026 18:47
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Signed-off-by: Ryan Northey <ryan@synca.io>
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
as this can still flake in msan

Signed-off-by: Ryan Northey <ryan@synca.io>
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Scale RequestTimeoutMs by TIMEOUT_FACTOR to match IdleTimeoutMs. The
hard-coded 200ms races test setup on loaded/sanitizer runs, causing the
request timer to fire before the test body executes.

fix #26979

Signed-off-by: Ryan Northey <ryan@synca.io>
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
#44500)

QuicSpdyClientSession::ShouldCreateOutgoingBidirectionalStream() can
legitimately return false, but we return true unconditionally to avoid a
nullptr deref in QuicHttpClientConnectionImpl::newStream.

The existing ASSERT on the parent's return value was therefore incorrect
and triggered flakes (e.g. in buffer_accounting_integration_test on
MSAN). Drop the ASSERT and update the comment.

Fixes #41526

---------

Signed-off-by: Ryan Northey <ryan@synca.io>
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
by separating the tests

Hopefully fix: #44427

Signed-off-by: Ryan Northey <ryan@synca.io>
Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>

Co-authored-by: phlax <454682+phlax@users.noreply.github.com>
Copilot AI changed the title [WIP] Backport flake-fix stack from PR #44644 to release/v1.35 [bp/1.35] Flakes stack Apr 27, 2026
Copilot AI requested a review from phlax April 27, 2026 18:52
@phlax phlax closed this Apr 27, 2026
@phlax phlax deleted the copilot/backport-flake-fix-to-release-v1-35 branch April 30, 2026 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants