Skip to content

Conversation

@markflyhigh
Copy link
Contributor

@markflyhigh markflyhigh commented Aug 6, 2019

Building Python tarball concurrently in each test suite caused race condition and make postcommit flaky (see BEAM-7527).

This change let Python tarball built once by sdks:python:sdist and then used in everywhere. So in order to use this tarball in other projects, tests/other tasks needs to depends on sdks:python:sdist and projects can use dependencies to reference distTarBall from sdks:python configurations.

Tests are needed especially Dataflow VR tests which is affected by the build race condition.

+R: @tvalentyn

Update:
beam_PostCommit_Py_VR_Dataflow_PR #119 is done.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

def cmdArgs = project.mapToArgString([
"test_opts": testOpts,
"sdk_location": "${project.buildDir}/apache-beam.tar.gz",
"sdk_location": files(configurations.distTarBall.files).singleFile,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line actually can be removed for all similar integration tests sine the default sdk_location (defined in run_integration_test.sh) is pointing to the correct location. However, I left it here since people who read this configuration will know how it's set and related to :sdks:python:sdist. I'm open to discussion if you think it can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to keep this. It feels like there may be a shorter way to reference the file defined by this configuration. Did you try configurations.distTarBall? Somehow it seems to work in

from configurations.sdkSourceTarball

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not. We need full path of this tarball, but configurations.distTarBall returns a string configuration ':sdks:python:test-suites:dataflow:py2:distTarBall'

@markflyhigh
Copy link
Contributor Author

Run Python Dataflow ValidatesRunner

@markflyhigh
Copy link
Contributor Author

markflyhigh commented Aug 6, 2019

Glad to see ModuleNotFoundError disappeared in beam_PostCommit_Py_VR_Dataflow_PR#119 and only one test failed (among 8 suites) which seems caused by pip service flaky.

I'll go ahead to fix tox broken and also run postcommit.

@markflyhigh
Copy link
Contributor Author

beam_PreCommit_Python_Commit #7937 passed. Tox tests are fixed.

@markflyhigh
Copy link
Contributor Author

Run Python 2 PostCommit

@markflyhigh
Copy link
Contributor Author

Run Python 3.7 PostCommit

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @markflyhigh !

project.exec {
executable 'sh'
args '-c', ". ${project.ext.envdir}/bin/activate && cd ${copiedPyRoot} && scripts/run_tox.sh $tox_env ${project.buildDir}/apache-beam.tar.gz"
args '-c', ". ${project.ext.envdir}/bin/activate && cd ${copiedPyRoot} && scripts/run_tox.sh $tox_env $distTarBall"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR - do you remember why do we pass a tarball to tox suite? It looks like we started doing that with #8067.

Copy link
Contributor Author

@markflyhigh markflyhigh Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tox will build tarball for venv install automatically if not provided. This build depends on shared file when running in parallel and cause our test flaky. So we prebuild the tarball and pass it from --installpkg flag to avoid that issue.

def cmdArgs = project.mapToArgString([
"test_opts": testOpts,
"sdk_location": "${project.buildDir}/apache-beam.tar.gz",
"sdk_location": files(configurations.distTarBall.files).singleFile,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to keep this. It feels like there may be a shorter way to reference the file defined by this configuration. Did you try configurations.distTarBall? Somehow it seems to work in

from configurations.sdkSourceTarball

// Set run order for basic tasks.
// This should be called after applyPythonNature() since TaskContainer
// requires task instances created first before setting the order.
project.ext.setTaskOrder = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... this feels a bit hacky. setupVirtualEnv is already in dependsOn of installGcpTest.

As for sdks:python:sdist, is it possible to add a dependency on distTarBall configuration instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, set setupVirtualenvrun after installGcpTest is not required in here. The main purpose is to set installGcpTest run after sdks:python:sdist in each project.

I don't know if depend on distTarBall will work or not but worth to try, so that we may be able to get rid of setTaskOrder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we can do installGcpTest.mustRunAfter configurations.distTarBall. This build shows correct run order. I'll go ahead and change the code.

@markflyhigh
Copy link
Contributor Author

Run Python 2 PostCommit

@markflyhigh
Copy link
Contributor Author

Java PreCommit failure is not related to this change.
An integration tests failed in verification step in Python PreCommit, but passed in Python2 PostCommit #60. It would be a test flaky and this change should not affect it.
Python2 PostCommit has been failed for a while due to https://issues.apache.org/jira/browse/BEAM-7924, which is also not related to this change.

Tests are done. PTAL @tvalentyn

@tvalentyn
Copy link
Contributor

tvalentyn commented Aug 8, 2019

LGTM, thanks a lot, @markflyhigh. Please run all test suites that were affected by this change before merging, I would suggest to merge this change on green and give @lukecwik a heads-up since Luke is looking into test heath ATM.

@markflyhigh
Copy link
Contributor Author

Run Python PreCommit

@markflyhigh
Copy link
Contributor Author

Run Python 3.7 PostCommit

@markflyhigh
Copy link
Contributor Author

+cc: @lukecwik

@markflyhigh
Copy link
Contributor Author

markflyhigh commented Aug 8, 2019

Rerun Python37_PostCommit #17 and Python_PreCommit #738, and they all passed.

@lukecwik
Copy link
Member

lukecwik commented Aug 8, 2019

Run Java PreCommit

@lukecwik
Copy link
Member

lukecwik commented Aug 8, 2019

Run Python 2 PostCommit

@lukecwik lukecwik merged commit b69c81a into apache:master Aug 8, 2019
@markflyhigh markflyhigh changed the title [BEAM-6907] Reuse Python tarball in tox & dataflow integration tests [BEAM-6907] Simply test setup by reusing Python tarball in tox & dataflow integration tests Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants