-
Notifications
You must be signed in to change notification settings - Fork 4.5k
sdks/python/scripts: fix posargs for run_pytest.sh
#35733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @mohamedawnallah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses a bug in the run_pytest.sh script by refining how command-line arguments are processed. The change ensures that user-provided pytest markers are accurately identified and handled from the intended positional arguments, thereby improving the script's reliability when executing tests.
Highlights
- Script Argument Handling Fix: I've corrected the variable used for parsing and manipulating
pytestcommand-line arguments, specifically the-m(marker) flag, within therun_pytest.shscript. The script now correctly usesposargsinstead ofpytest_argsfor this logic.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly refactors run_pytest.sh to parse the -m (marker) argument from posargs instead of pytest_args. The change is logical and directly addresses the goal stated in the pull request title.
My review includes two main points:
- A suggestion to refactor the marker parsing logic to be more efficient and correct by using native bash features instead of
sed. - A high-severity issue I've identified with how
$posargsis passed topytest's--pyargsoption, which could lead to test failures.
|
/gemini review |
1 similar comment
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix how positional arguments are handled in the run_pytest.sh script, addressing a regression from a previous change. The modifications correctly target posargs for marker extraction and introduce more robust parsing logic using native bash features instead of sed.
While the intent is good and the move away from sed is a positive step for readability and performance, I've identified a few issues that need to be addressed:
- Critical Security Vulnerability: The use of
evalon user-controlled input introduces a command injection vulnerability. - Correctness Issues: The argument parsing logic is brittle and may fail for common
pytestoptions. Additionally, the regular expression for unquoting strings is flawed. - Clarity: A debug message has become misleading due to the changes.
My review comments below provide detailed explanations and suggestions for each of these points. Addressing them will significantly improve the script's security and robustness.
| test_paths="" | ||
|
|
||
| # Safely split the posargs string into individual arguments. | ||
| eval "set -- $posargs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of eval on $posargs introduces a critical command injection vulnerability. Since $posargs can be controlled by the user invoking tox, a malicious string could lead to arbitrary command execution on the system running the script.
For example, a user could pass '; rm -rf ~' as part of the arguments, and eval would execute the rm command.
Please replace eval with a safer method for parsing the arguments. While parsing shell arguments with quotes is a complex problem, using eval on untrusted input is not secure. Consider if the arguments can be passed from tox.ini in a way that they are already available as a shell array, which would avoid this parsing issue entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist Other alternatives would result in adding more complexity to this script in terms of looping through tokens, handling quotes, ...etc. Also run_pytest.sh run by CI environments through workflows and there minimal change happen to them and reviewed so not sure if it is considered "critical". I will leave it as is unless other reviewer has another opinion on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concerns regarding increased complexity and the review process in CI environments. However, the use of eval with user-controlled input like $posargs inherently introduces a critical command injection vulnerability. This is a fundamental security flaw that allows arbitrary code execution, regardless of whether the script is run in a CI environment or how frequently it's reviewed.
Even in a controlled CI setting, a successful exploit could lead to severe consequences, such as data exfiltration, system compromise, or disruption of the build process. While parsing shell arguments safely can be complex, the security implications of eval on untrusted input are too significant to ignore. Prioritizing security often necessitates refactoring for safer alternatives, even if it introduces some additional complexity.
I maintain that this is a critical issue that should be addressed to prevent potential security breaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist please give me alternative that fits nicely in this script instead of eval "set -- $posargs"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your request for an alternative that "fits nicely" within the existing script structure while addressing the critical command injection vulnerability. The use of eval with user-controlled input like $posargs is inherently unsafe, as it allows arbitrary command execution if a malicious string is passed.
To safely parse the arguments, respecting quotes, without using eval, you can implement a manual parsing loop. This approach is more verbose but provides a robust and secure way to handle the argument splitting.
This code snippet replaces the eval call with a manual parser that correctly handles arguments enclosed in single or double quotes, ensuring that the script remains secure against command injection.
| eval "set -- $posargs" | |
| parsed_args=() | |
| current_arg="" | |
| in_quote='' # Can be '', "'", or '"' | |
| for (( i=0; i<${#posargs}; i++ )); do | |
| char="${posargs:$i:1}" | |
| if [[ "$in_quote" == '' ]]; then | |
| if [[ "$char" == "'" || "$char" == "\"" ]]; then | |
| in_quote="$char" | |
| elif [[ "$char" == " " ]]; then | |
| # Add current_arg to parsed_args if not empty | |
| if [[ -n "$current_arg" ]]; then | |
| parsed_args+=("$current_arg") | |
| current_arg="" | |
| fi | |
| else | |
| current_arg+="$char" | |
| fi | |
| else # Inside a quote | |
| if [[ "$char" == "$in_quote" ]]; then | |
| in_quote='' # End of quote | |
| else | |
| current_arg+="$char" | |
| fi | |
| fi | |
| done | |
| # Add the last argument if it's not empty | |
| if [[ -n "$current_arg" ]]; then | |
| parsed_args+=("$current_arg") | |
| fi | |
| set -- "${parsed_args[@]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist We can do this change in follow-up PR? Bash is tricky and take me a bit of time to test them. I can say I am confident that current changes eval "set -- $posargs" work functionally non functionally in terms of security as you said not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to worry about this. We're directly invoking this script and control what arguments are passed in. If anyone invokes this script, they already have full permissions to run arbitrary code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce fixes to positional arguments in run_pytest.sh. However, the current implementation introduces a command injection vulnerability due to the use of eval. Additionally, there are issues with the logic for detecting and stripping quotes. Addressing these security and correctness issues is crucial.
sdks/python/scripts/run_pytest.sh
Outdated
| if [[ $posargs == \"*\" ]]; then | ||
| # If wrapped in double quotes, remove them | ||
| posargs="${posargs:1:${#posargs}-2}" | ||
| elif [[ $posargs == \'*\' ]]; then | ||
| # If wrapped in single quotes, remove them. | ||
| posargs="${posargs:1:${#posargs}-2}" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions to check for surrounding quotes are incorrect. The patterns \"*\" and \'*\' will be interpreted by [[ as literal string comparisons against *, not as glob patterns that check for surrounding quotes.
To correctly check if a string starts and ends with a specific character using globbing, the pattern should be unquoted inside [[ ... ]]. For literal quotes, you need to quote them appropriately for the shell.
The corrected version will properly detect and strip surrounding quotes.
| if [[ $posargs == \"*\" ]]; then | |
| # If wrapped in double quotes, remove them | |
| posargs="${posargs:1:${#posargs}-2}" | |
| elif [[ $posargs == \'*\' ]]; then | |
| # If wrapped in single quotes, remove them. | |
| posargs="${posargs:1:${#posargs}-2}" | |
| fi | |
| if [[ $posargs == '"'*'"' ]]; then | |
| # If wrapped in double quotes, remove them | |
| posargs="${posargs:1:${#posargs}-2}" | |
| elif [[ $posargs == "'"*"'" ]]; then | |
| # If wrapped in single quotes, remove them. | |
| posargs="${posargs:1:${#posargs}-2}" | |
| fi |
|
Regards the failed CI environment it seems there are more dependencies issues related to |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
cc: @Abacn (#35734 (comment)) |
|
I can minimally test this especially regards the workflows it would be great if someone has permissions and can test the patch in this PR fix the failed CI tests PS: |
|
Assigning reviewers: R: @claudevdm for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
damccorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can minimally test this especially regards the workflows it would be great if someone has permissions and can test the patch in this PR fix the failed CI tests
Its hard to test this without merging since we'd need to run off of a branch in the main repo. I'm going to merge and then we can iterate from there if needed.
Thanks!
| test_paths="" | ||
|
|
||
| # Safely split the posargs string into individual arguments. | ||
| eval "set -- $posargs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to worry about this. We're directly invoking this script and control what arguments are passed in. If anyone invokes this script, they already have full permissions to run arbitrary code.
|
Kicking off some runs. Postcommit Python - https://github.com/apache/beam/actions/runs/16624622172 |
Looks like there is still an error on Precommit ML: |
Included a patch for this #35740 |
Description
This change set supposed to fix failed regressions in
beam_PreCommit_Python_MLandbeam_PostCommit_Pythonas regression of this PR #35698.Motivation and Context
#35698 (comment)
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.