sdks/python/scripts: fix posargs for `run_pytest.sh` #35733

mohamedawnallah · 2025-07-30T01:44:49Z

Description

This change set supposed to fix failed regressions in beam_PreCommit_Python_ML and beam_PostCommit_Python as regression of this PR #35698.

Motivation and Context

#35698 (comment)

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

gemini-code-assist

Summary of Changes

Hello @mohamedawnallah, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the run_pytest.sh script by refining how command-line arguments are processed. The change ensures that user-provided pytest markers are accurately identified and handled from the intended positional arguments, thereby improving the script's reliability when executing tests.

Highlights

Script Argument Handling Fix: I've corrected the variable used for parsing and manipulating pytest command-line arguments, specifically the -m (marker) flag, within the run_pytest.sh script. The script now correctly uses posargs instead of pytest_args for this logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly refactors run_pytest.sh to parse the -m (marker) argument from posargs instead of pytest_args. The change is logical and directly addresses the goal stated in the pull request title.

My review includes two main points:

A suggestion to refactor the marker parsing logic to be more efficient and correct by using native bash features instead of sed.
A high-severity issue I've identified with how $posargs is passed to pytest's --pyargs option, which could lead to test failures.

sdks/python/scripts/run_pytest.sh

mohamedawnallah · 2025-07-30T03:12:49Z

/gemini review

mohamedawnallah · 2025-07-30T03:14:38Z

/gemini review

gemini-code-assist

Code Review

This pull request aims to fix how positional arguments are handled in the run_pytest.sh script, addressing a regression from a previous change. The modifications correctly target posargs for marker extraction and introduce more robust parsing logic using native bash features instead of sed.

While the intent is good and the move away from sed is a positive step for readability and performance, I've identified a few issues that need to be addressed:

Critical Security Vulnerability: The use of eval on user-controlled input introduces a command injection vulnerability.
Correctness Issues: The argument parsing logic is brittle and may fail for common pytest options. Additionally, the regular expression for unquoting strings is flawed.
Clarity: A debug message has become misleading due to the changes.

My review comments below provide detailed explanations and suggestions for each of these points. Addressing them will significantly improve the script's security and robustness.

gemini-code-assist · 2025-07-30T03:15:46Z

sdks/python/scripts/run_pytest.sh

+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"


The use of eval on $posargs introduces a critical command injection vulnerability. Since $posargs can be controlled by the user invoking tox, a malicious string could lead to arbitrary command execution on the system running the script.

For example, a user could pass '; rm -rf ~' as part of the arguments, and eval would execute the rm command.

Please replace eval with a safer method for parsing the arguments. While parsing shell arguments with quotes is a complex problem, using eval on untrusted input is not secure. Consider if the arguments can be passed from tox.ini in a way that they are already available as a shell array, which would avoid this parsing issue entirely.

@gemini-code-assist Other alternatives would result in adding more complexity to this script in terms of looping through tokens, handling quotes, ...etc. Also run_pytest.sh run by CI environments through workflows and there minimal change happen to them and reviewed so not sure if it is considered "critical". I will leave it as is unless other reviewer has another opinion on it

I understand your concerns regarding increased complexity and the review process in CI environments. However, the use of eval with user-controlled input like $posargs inherently introduces a critical command injection vulnerability. This is a fundamental security flaw that allows arbitrary code execution, regardless of whether the script is run in a CI environment or how frequently it's reviewed.

Even in a controlled CI setting, a successful exploit could lead to severe consequences, such as data exfiltration, system compromise, or disruption of the build process. While parsing shell arguments safely can be complex, the security implications of eval on untrusted input are too significant to ignore. Prioritizing security often necessitates refactoring for safer alternatives, even if it introduces some additional complexity.

I maintain that this is a critical issue that should be addressed to prevent potential security breaches.

@gemini-code-assist please give me alternative that fits nicely in this script instead of eval "set -- $posargs"

I understand your request for an alternative that "fits nicely" within the existing script structure while addressing the critical command injection vulnerability. The use of eval with user-controlled input like $posargs is inherently unsafe, as it allows arbitrary command execution if a malicious string is passed.

To safely parse the arguments, respecting quotes, without using eval, you can implement a manual parsing loop. This approach is more verbose but provides a robust and secure way to handle the argument splitting.

This code snippet replaces the eval call with a manual parser that correctly handles arguments enclosed in single or double quotes, ensuring that the script remains secure against command injection.

Suggested change

eval "set -- $posargs"

parsed_args=()

current_arg=""

in_quote='' # Can be '', "'", or '"'

for (( i=0; i<${#posargs}; i++ )); do

char="${posargs:$i:1}"

if [[ "$in_quote" == '' ]]; then

if [[ "$char" == "'" || "$char" == "\"" ]]; then

in_quote="$char"

elif [[ "$char" == " " ]]; then

# Add current_arg to parsed_args if not empty

if [[ -n "$current_arg" ]]; then

parsed_args+=("$current_arg")

current_arg=""

fi

else

current_arg+="$char"

fi

else # Inside a quote

if [[ "$char" == "$in_quote" ]]; then

in_quote='' # End of quote

else

current_arg+="$char"

fi

fi

done

# Add the last argument if it's not empty

if [[ -n "$current_arg" ]]; then

parsed_args+=("$current_arg")

fi

set -- "${parsed_args[@]}"

@gemini-code-assist We can do this change in follow-up PR? Bash is tricky and take me a bit of time to test them. I can say I am confident that current changes eval "set -- $posargs" work functionally non functionally in terms of security as you said not.

I think we don't need to worry about this. We're directly invoking this script and control what arguments are passed in. If anyone invokes this script, they already have full permissions to run arbitrary code.

sdks/python/scripts/run_pytest.sh

gemini-code-assist

Code Review

The code changes introduce fixes to positional arguments in run_pytest.sh. However, the current implementation introduces a command injection vulnerability due to the use of eval. Additionally, there are issues with the logic for detecting and stripping quotes. Addressing these security and correctness issues is crucial.

gemini-code-assist · 2025-07-30T03:18:57Z

sdks/python/scripts/run_pytest.sh

+if [[ $posargs == \"*\" ]]; then
+  # If wrapped in double quotes, remove them
+  posargs="${posargs:1:${#posargs}-2}"
+elif [[ $posargs == \'*\' ]]; then
+  # If wrapped in single quotes, remove them.
+  posargs="${posargs:1:${#posargs}-2}"
+fi


The conditions to check for surrounding quotes are incorrect. The patterns \"*\" and \'*\' will be interpreted by [[ as literal string comparisons against *, not as glob patterns that check for surrounding quotes.

To correctly check if a string starts and ends with a specific character using globbing, the pattern should be unquoted inside [[ ... ]]. For literal quotes, you need to quote them appropriately for the shell.

The corrected version will properly detect and strip surrounding quotes.

Suggested change

if [[ $posargs == \"*\" ]]; then

# If wrapped in double quotes, remove them

posargs="${posargs:1:${#posargs}-2}"

elif [[ $posargs == \'*\' ]]; then

# If wrapped in single quotes, remove them.

posargs="${posargs:1:${#posargs}-2}"

fi

if [[ $posargs == '"'*'"' ]]; then

# If wrapped in double quotes, remove them

posargs="${posargs:1:${#posargs}-2}"

elif [[ $posargs == "'"*"'" ]]; then

# If wrapped in single quotes, remove them.

posargs="${posargs:1:${#posargs}-2}"

fi

sdks/python/scripts/run_pytest.sh

mohamedawnallah · 2025-07-30T03:54:26Z

cc @damccorm, @liferoad this fixes a bug in the run_pytest.sh resulted from my changes in #35655. It was mainly about the confusion between posargs and pytest_args

mohamedawnallah · 2025-07-30T04:02:32Z

Regards the failed CI environment it seems there are more dependencies issues related to ubuntu-latest I've not dig deeper for immediate resolution but I think a temporary solution is first this PR fixes the bug and then that one #35734 reverts ubuntu-latest env temporary till those dependencies issues addressed

github-actions · 2025-07-30T05:13:53Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

mohamedawnallah · 2025-07-30T10:35:41Z

cc: @Abacn (#35734 (comment))

mohamedawnallah · 2025-07-30T10:36:52Z

I can minimally test this especially regards the workflows it would be great if someone has permissions and can test the patch in this PR fix the failed CI tests

PS:
The workflows are modified so failed CI tests down below not updated to reflect those modifications since they are pull_request_target

github-actions · 2025-07-30T12:12:34Z

Assigning reviewers:

R: @claudevdm for label python.
R: @damccorm for label build.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

damccorm

I can minimally test this especially regards the workflows it would be great if someone has permissions and can test the patch in this PR fix the failed CI tests

Its hard to test this without merging since we'd need to run off of a branch in the main repo. I'm going to merge and then we can iterate from there if needed.

Thanks!

damccorm · 2025-07-30T13:50:25Z

sdks/python/scripts/run_pytest.sh

+test_paths=""
+
+# Safely split the posargs string into individual arguments.
+eval "set -- $posargs"


I think we don't need to worry about this. We're directly invoking this script and control what arguments are passed in. If anyone invokes this script, they already have full permissions to run arbitrary code.

damccorm · 2025-07-30T13:57:46Z

Kicking off some runs.

Postcommit Python - https://github.com/apache/beam/actions/runs/16624622172
Precommit ML - https://github.com/apache/beam/actions/runs/16624612326

shunping · 2025-07-30T15:07:04Z

Kicking off some runs.

Postcommit Python - https://github.com/apache/beam/actions/runs/16624622172 Precommit ML - https://github.com/apache/beam/actions/runs/16624612326

Looks like there is still an error on Precommit ML:

Running sequential tests with: pytest -m "(not and (no_xdist)"  --pyargs  py39-ml 'apache_beam/ml/ -m (not require_docker_in_docker)'
ERROR: module or package not found: py39-ml (missing __init__.py?)

mohamedawnallah · 2025-07-30T15:19:24Z

Kicking off some runs.
Postcommit Python - https://github.com/apache/beam/actions/runs/16624622172 Precommit ML - https://github.com/apache/beam/actions/runs/16624612326

Looks like there is still an error on Precommit ML:
Running sequential tests with: pytest -m "(not and (no_xdist)"  --pyargs  py39-ml 'apache_beam/ml/ -m (not require_docker_in_docker)'
ERROR: module or package not found: py39-ml (missing __init__.py?)

Included a patch for this #35740

sdks/python/scripts: fix posargs for run_pytest.sh

344a9ce

github-actions bot added the python label Jul 30, 2025

gemini-code-assist bot reviewed Jul 30, 2025

View reviewed changes

sdks/python/scripts/run_pytest.sh Outdated Show resolved Hide resolved

sdks/python/scripts/run_pytest.sh Outdated Show resolved Hide resolved

sdks/python/scripts: isolate test paths from pyargs

3a94a87

gemini-code-assist bot reviewed Jul 30, 2025

View reviewed changes

mohamedawnallah added 2 commits July 30, 2025 03:47

sdks/python/scripts: address some gemini feedback

4d5f702

.github: trigger postcommit python

864683c

mohamedawnallah mentioned this pull request Jul 30, 2025

.github: remove ubuntu-latest environment temporary from beam_PreCommit_Python_ML and beam_PostCommit_Python to make CI green again #35734

Merged

3 tasks

mohamedawnallah added 3 commits July 30, 2025 09:46

Merge remote-tracking branch 'upstream/master' into fixPytestScript

b819b70

.github: trigger postcommit python

0f6d8c0

.github+scripts: handle quotes properly for run_pytest.sh

151e278

github-actions bot added the build label Jul 30, 2025

.github: trigger postcommit python

f53d37b

github-actions bot added the Next Action: Reviewers label Jul 30, 2025

liferoad requested review from Amar3tto and damccorm July 30, 2025 13:32

damccorm approved these changes Jul 30, 2025

View reviewed changes

damccorm merged commit 480fcc8 into apache:master Jul 30, 2025
85 of 94 checks passed

mohamedawnallah mentioned this pull request Jul 30, 2025

sdks/python/scripts: capture (...) pytest marker group #35740

Merged

3 tasks

mohamedawnallah deleted the fixPytestScript branch August 2, 2025 19:28

-eval "set -- $posargs"
+parsed_args=()
+current_arg=""
+in_quote='' # Can be '', "'", or '"'
+for (( i=0; i<${#posargs}; i++ )); do
+  char="${posargs:$i:1}"
+  if [[ "$in_quote" == '' ]]; then
+    if [[ "$char" == "'" || "$char" == "\"" ]]; then
+      in_quote="$char"
+    elif [[ "$char" == " " ]]; then
+      # Add current_arg to parsed_args if not empty
+      if [[ -n "$current_arg" ]]; then
+        parsed_args+=("$current_arg")
+        current_arg=""
+      fi
+    else
+      current_arg+="$char"
+    fi
+  else # Inside a quote
+    if [[ "$char" == "$in_quote" ]]; then
+      in_quote='' # End of quote
+    else
+      current_arg+="$char"
+    fi
+  fi
+done
+# Add the last argument if it's not empty
+if [[ -n "$current_arg" ]]; then
+  parsed_args+=("$current_arg")
+fi
+set -- "${parsed_args[@]}"

sdks/python/scripts: fix posargs for run_pytest.sh #35733

sdks/python/scripts: fix posargs for run_pytest.sh #35733

Uh oh!

Conversation

mohamedawnallah commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

GitHub Actions Tests Status (on master branch)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mohamedawnallah commented Jul 30, 2025

Uh oh!

mohamedawnallah commented Jul 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

mohamedawnallah Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

mohamedawnallah Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

mohamedawnallah Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damccorm Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mohamedawnallah commented Jul 30, 2025

Uh oh!

mohamedawnallah commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

mohamedawnallah commented Jul 30, 2025

Uh oh!

mohamedawnallah commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdks/python/scripts: fix posargs for `run_pytest.sh` #35733

sdks/python/scripts: fix posargs for `run_pytest.sh` #35733

mohamedawnallah commented Jul 30, 2025 •

edited

Loading

mohamedawnallah Jul 30, 2025 •

edited

Loading

mohamedawnallah Jul 30, 2025 •

edited

Loading

mohamedawnallah Jul 30, 2025 •

edited

Loading

mohamedawnallah commented Jul 30, 2025 •

edited

Loading