Skip to content

Conversation

@A1K28
Copy link
Contributor

@A1K28 A1K28 commented Aug 16, 2025

Fixes #30813

This PR addresses a few issues for the beam_PreCommit_Python_Coverage workflow:

  1. Uses DinD to fix the "no space left on device" error, which was caused by insufficient (1MB) space on tmpfs (/var).
  2. Downgrades milvus version to account for CPU instability.
  3. Adds parameters to conftest.py to make the test runs deterministic and fix flaky runs.
  4. Disabled parallel runs to make the test runs deterministic.
  5. Uses highmem instead of main to increase stability.

All of my runs were successful except for one, that i got for overloading downstream, which only happened because i ran 8 workflows in parallel: https://github.com/A1K28/beam/actions/workflows/beam_PreCommit_Python_Coverage.yml?query=branch%3Acoverage-v6

This PR should fix most of (if not all) flakiness in the workflow runs. As a side effect, the workflow run will take longer, however it was necessary to disable parallelism to fix a lot of flaky issues.

IMPORTANT: The python coverage workflow fails here because it does not use the updated workflow to set up DinD. I am not sure why this happens. Here is the actual run: https://github.com/A1K28/beam/actions/runs/17007645420


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @A1K28, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the determinism and stability of tests, particularly integration tests that rely on Docker-in-Docker (DinD) environments. It introduces a new GitHub Action for robust DinD setup and teardown, alongside comprehensive Python test configuration changes to enforce sequential execution, manage RPC timeouts, and ensure clean test states. These changes are crucial for reducing flaky test failures and ensuring consistent test results across different environments, ultimately improving the reliability of the CI/CD pipeline.

Highlights

  • New GitHub Action for Docker-in-Docker (DinD) Management: A new composite GitHub Action (dind-up-action/action.yml) has been introduced to standardize and robustly manage Docker-in-Docker (DinD) environments for CI/CD. This action handles DinD container lifecycle, volume management, health checks, and port mapping verification, ensuring a consistent and reliable test setup.
  • Enhanced Test Determinism and Stability Configuration: Extensive pytest fixtures have been added to conftest.py to enforce deterministic test execution. This includes configuring gRPC timeouts, forcing single bundle processing, ensuring deterministic element ordering, limiting worker parallelism, and crucially, forcing sequential pytest execution (PYTEST_XDIST_WORKER_COUNT=1) to prevent race conditions and flakiness.
  • Milvus Docker Image Version Downgrade: The Milvus Docker image version used in integration tests (milvus_search_it_test.py) has been downgraded from v2.5.10 to v2.3.9. This change likely addresses stability or compatibility issues encountered with the newer version in the automated test environment.
  • Improved Test Environment Variable Management: The tox.ini configuration has been updated to properly pass through and set environment variables related to Docker and Testcontainers (DOCKER_*, TESTCONTAINERS_*, TC_*). This ensures that the newly introduced DinD and test stability configurations are correctly applied within the Python test environments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@A1K28 A1K28 force-pushed the 30813-make-tests-deterministic branch from 44d8715 to 6b664b2 Compare August 16, 2025 16:51
@A1K28
Copy link
Contributor Author

A1K28 commented Aug 16, 2025

@Amar3tto

@A1K28 A1K28 marked this pull request as ready for review August 17, 2025 08:22
@A1K28
Copy link
Contributor Author

A1K28 commented Aug 17, 2025

R: @Abacn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this LGTM! Letting checks complete before merging

@damccorm
Copy link
Contributor

Looks like there are still some flakes in coverage, but I think this PR is still a step in the right direction

@damccorm damccorm merged commit a6f63ac into apache:master Aug 18, 2025
99 of 102 checks passed
@A1K28
Copy link
Contributor Author

A1K28 commented Aug 18, 2025

Looks like there are still some flakes in coverage, but I think this PR is still a step in the right direction

Yeah, so the workflow fails here because it does not use the updated workflow to set up DinD. I am not sure why this happens in the PR.

@A1K28
Copy link
Contributor Author

A1K28 commented Aug 18, 2025

Screenshot 2025-08-19 at 00 29 39

There is no "Setup DinD" step here for some reason. I dont know why this happens in the PR.

@damccorm
Copy link
Contributor

Oh, it is because GitHub doesn't pick up workflow changes until they're submitted. So it should start running with those changes now

@A1K28
Copy link
Contributor Author

A1K28 commented Aug 18, 2025

I see, thanks! that's good to know.

DKER2 pushed a commit to DKER2/beam that referenced this pull request Aug 20, 2025
* make py39-cloudcoverage deterministic (apache#30813)

* increase grpc timeouts (apache#30813)

* fix linting (apache#30813)

* revert timeout increase (apache#30813)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The PreCommit Python Coverage job is flaky

3 participants