Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/trigger_files/beam_PostCommit_Python.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"comment": "Modify this file in a trivial way to cause this test suite to run.",
"modification": 4
"modification": 29
}

4 changes: 2 additions & 2 deletions .github/workflows/beam_PostCommit_Python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ jobs:
-PuseWheelDistribution \
-Pposargs="${{
contains(matrix.os, 'self-hosted') &&
'-m ''not require_docker_in_docker''' ||
'-m ''require_docker_in_docker'''
'-m (not require_docker_in_docker)' ||
'-m require_docker_in_docker'
}}" \
-PpythonVersion=${{ matrix.python_version }} \
env:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/beam_PreCommit_Python_ML.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ jobs:
arguments: |
-Pposargs="${{
contains(matrix.os, 'self-hosted') &&
'apache_beam/ml/ -m ''not require_docker_in_docker''' ||
'apache_beam/ml/ -m ''require_docker_in_docker'''
'apache_beam/ml/ -m (not require_docker_in_docker)' ||
'apache_beam/ml/ -m require_docker_in_docker'
}}" \
-PpythonVersion=${{ matrix.python_version }}
- name: Archive Python Test Results
Expand Down
95 changes: 79 additions & 16 deletions sdks/python/scripts/run_pytest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,15 @@ envname=${1?First argument required: suite base name}
posargs=$2
pytest_args=$3

# strip leading/trailing quotes from posargs because it can get double quoted as its passed through.
posargs=$(sed -e 's/^"//' -e 's/"$//' -e "s/'$//" -e "s/^'//" <<<$posargs)
# strip leading/trailing quotes from posargs because it can get double quoted as
# its passed through.
if [[ $posargs == '"'*'"' ]]; then
# If wrapped in double quotes, remove them
posargs="${posargs:1:${#posargs}-2}"
elif [[ $posargs == "'"*"'" ]]; then
# If wrapped in single quotes, remove them.
posargs="${posargs:1:${#posargs}-2}"
fi
echo "pytest_args: $pytest_args"
echo "posargs: $posargs"

Expand All @@ -41,15 +48,26 @@ marker_regex="-m\s+('[^']+'|\"[^\"]+\"|[^ ]+)"
# Initialize the user_marker variable.
user_marker=""

# Isolate the user-provided -m argument (matching only).
if [[ $pytest_args =~ "-m" ]]; then
# Extract the marker value using the defined regex.
user_marker=$(echo "$pytest_args" | sed -nE "s/.*$marker_regex.*/\1/p")
fi
# Define regex pattern for quoted strings
quotes_regex="^[\"\'](.*)[\"\']$"

# Remove the -m argument from pytest_args (substitution only).
if [[ -n $user_marker ]]; then
pytest_args=$(echo "$pytest_args" | sed -E "s/$marker_regex//")
# Extract the user markers.
if [[ $posargs =~ $marker_regex ]]; then
# Get the full match including -m and the marker.
full_match="${BASH_REMATCH[0]}"

# Get the marker with quotes (this is the first capture group).
quoted_marker="${BASH_REMATCH[1]}"

# Remove any quotes around the marker.
if [[ $quoted_marker =~ $quotes_regex ]]; then
user_marker="${BASH_REMATCH[1]}"
else
user_marker="$quoted_marker"
fi

# Remove the entire -m marker portion from posargs.
posargs="${posargs/$full_match/}"
fi

# Combine user-provided marker with script's internal logic.
Expand All @@ -58,20 +76,65 @@ marker_for_sequential_tests="no_xdist"

if [[ -n $user_marker ]]; then
# Combine user marker with internal markers.
marker_for_parallel_tests="($user_marker) and ($marker_for_parallel_tests)"
marker_for_sequential_tests="($user_marker) and ($marker_for_sequential_tests)"
marker_for_parallel_tests="$user_marker and ($marker_for_parallel_tests)"
marker_for_sequential_tests="$user_marker and ($marker_for_sequential_tests)"
fi

# Parse posargs to separate pytest options from test paths.
options=""
test_paths=""

# Safely split the posargs string into individual arguments.
eval "set -- $posargs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The use of eval on $posargs introduces a critical command injection vulnerability. Since $posargs can be controlled by the user invoking tox, a malicious string could lead to arbitrary command execution on the system running the script.

For example, a user could pass '; rm -rf ~' as part of the arguments, and eval would execute the rm command.

Please replace eval with a safer method for parsing the arguments. While parsing shell arguments with quotes is a complex problem, using eval on untrusted input is not secure. Consider if the arguments can be passed from tox.ini in a way that they are already available as a shell array, which would avoid this parsing issue entirely.

Copy link
Contributor Author

@mohamedawnallah mohamedawnallah Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Other alternatives would result in adding more complexity to this script in terms of looping through tokens, handling quotes, ...etc. Also run_pytest.sh run by CI environments through workflows and there minimal change happen to them and reviewed so not sure if it is considered "critical". I will leave it as is unless other reviewer has another opinion on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concerns regarding increased complexity and the review process in CI environments. However, the use of eval with user-controlled input like $posargs inherently introduces a critical command injection vulnerability. This is a fundamental security flaw that allows arbitrary code execution, regardless of whether the script is run in a CI environment or how frequently it's reviewed.

Even in a controlled CI setting, a successful exploit could lead to severe consequences, such as data exfiltration, system compromise, or disruption of the build process. While parsing shell arguments safely can be complex, the security implications of eval on untrusted input are too significant to ignore. Prioritizing security often necessitates refactoring for safer alternatives, even if it introduces some additional complexity.

I maintain that this is a critical issue that should be addressed to prevent potential security breaches.

Copy link
Contributor Author

@mohamedawnallah mohamedawnallah Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist please give me alternative that fits nicely in this script instead of eval "set -- $posargs"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your request for an alternative that "fits nicely" within the existing script structure while addressing the critical command injection vulnerability. The use of eval with user-controlled input like $posargs is inherently unsafe, as it allows arbitrary command execution if a malicious string is passed.

To safely parse the arguments, respecting quotes, without using eval, you can implement a manual parsing loop. This approach is more verbose but provides a robust and secure way to handle the argument splitting.

This code snippet replaces the eval call with a manual parser that correctly handles arguments enclosed in single or double quotes, ensuring that the script remains secure against command injection.

Suggested change
eval "set -- $posargs"
parsed_args=()
current_arg=""
in_quote='' # Can be '', "'", or '"'
for (( i=0; i<${#posargs}; i++ )); do
char="${posargs:$i:1}"
if [[ "$in_quote" == '' ]]; then
if [[ "$char" == "'" || "$char" == "\"" ]]; then
in_quote="$char"
elif [[ "$char" == " " ]]; then
# Add current_arg to parsed_args if not empty
if [[ -n "$current_arg" ]]; then
parsed_args+=("$current_arg")
current_arg=""
fi
else
current_arg+="$char"
fi
else # Inside a quote
if [[ "$char" == "$in_quote" ]]; then
in_quote='' # End of quote
else
current_arg+="$char"
fi
fi
done
# Add the last argument if it's not empty
if [[ -n "$current_arg" ]]; then
parsed_args+=("$current_arg")
fi
set -- "${parsed_args[@]}"

Copy link
Contributor Author

@mohamedawnallah mohamedawnallah Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist We can do this change in follow-up PR? Bash is tricky and take me a bit of time to test them. I can say I am confident that current changes eval "set -- $posargs" work functionally non functionally in terms of security as you said not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to worry about this. We're directly invoking this script and control what arguments are passed in. If anyone invokes this script, they already have full permissions to run arbitrary code.


# Iterate through arguments.
while [[ $# -gt 0 ]]; do
arg="$1"
shift

# If argument starts with dash, it's an option.
if [[ "$arg" == -* ]]; then
options+=" $arg"

# Check if there's a next argument and it doesn't start with a dash.
# This assumes it's a value for the current option.
if [[ $# -gt 0 && "$1" != -* ]]; then
# Get the next argument.
next_arg="$1"

# Check if it's quoted and remove quotes if needed.
if [[ $next_arg =~ $quotes_regex ]]; then
# Extract the content inside quotes.
next_arg="${BASH_REMATCH[1]}"
fi

# Add the unquoted value to options.
options+=" $next_arg"
shift
fi
else
# Otherwise it's a test path.
test_paths+=" $arg"
fi
done

# Construct the final pytest command arguments.
pyargs_section=""
if [[ -n "$test_paths" ]]; then
pyargs_section="--pyargs $test_paths"
fi
pytest_command_args="$options $pyargs_section"

# Run tests in parallel.
echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" $pytest_args"
echo "Running parallel tests with: pytest -m \"$marker_for_parallel_tests\" $pytest_command_args"
pytest -v -rs -o junit_suite_name=${envname} \
--junitxml=pytest_${envname}.xml -m "$marker_for_parallel_tests" -n 6 --import-mode=importlib ${pytest_args} --pyargs ${posargs}
--junitxml=pytest_${envname}.xml -m "$marker_for_parallel_tests" -n 6 --import-mode=importlib ${pytest_args} ${pytest_command_args}
status1=$?

# Run tests sequentially.
echo "Running sequential tests with: pytest -m \"$marker_for_sequential_tests\" $pytest_args"
echo "Running sequential tests with: pytest -m \"$marker_for_sequential_tests\" $pytest_command_args"
pytest -v -rs -o junit_suite_name=${envname}_no_xdist \
--junitxml=pytest_${envname}_no_xdist.xml -m "$marker_for_sequential_tests" --import-mode=importlib ${pytest_args} --pyargs ${posargs}
--junitxml=pytest_${envname}_no_xdist.xml -m "$marker_for_sequential_tests" --import-mode=importlib ${pytest_args} ${pytest_command_args}
status2=$?

# Exit with error if no tests were run in either suite (status code 5).
Expand Down
Loading