Skip to content

Conversation

@damccorm
Copy link
Contributor

@damccorm damccorm commented Mar 10, 2022

Right now, multiple go integration tests are excluded on Flink because they run into the following Flink error:

java.io.IOException: Insufficient number of network buffers: required 17, but only 14 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.

(the number of available/required network buffers is variable from run to run).

The underlying cause of this error is that we allow a greater parallelism than we have network memory to support for these tests (specifically, parallelism = # available cores[1]), and we run into issues when performing shuffle operations. This is expected to happen sometimes[1] depending on the workload, with the expectation that users will do the appropriate tuning. This PR fixes this problem by doubling the fraction of flink memory to be used as network memory from 10% to 20% and upping the max network memory for task executors from 1 gb to 2 gb (see more on these config options here - https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-network-fraction). This will help us avoid this issue while still staying within the bounds of a reasonable flink config.

[1] https://lists.apache.org/thread/p04qpcxxfvpqzowgy7lpos1gzdns4cnm


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@damccorm
Copy link
Contributor Author

Run Go Flink ValidatesRunner

@codecov
Copy link

codecov bot commented Mar 10, 2022

Codecov Report

Merging #17067 (574ec37) into master (fa8979c) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 574ec37 differs from pull request most recent head 1769be9. Consider uploading reports for the commit 1769be9 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17067      +/-   ##
==========================================
- Coverage   73.94%   73.94%   -0.01%     
==========================================
  Files         667      667              
  Lines       88056    88056              
==========================================
- Hits        65116    65112       -4     
- Misses      21829    21833       +4     
  Partials     1111     1111              
Flag Coverage Δ
python 83.64% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pache_beam/runners/interactive/interactive_beam.py 80.62% <0.00%> (-0.78%) ⬇️
...runners/interactive/display/pcoll_visualization.py 85.85% <0.00%> (-0.51%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 89.06% <0.00%> (-0.48%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.51% <0.00%> (-0.13%) ⬇️
sdks/python/apache_beam/io/source_test_utils.py 89.40% <0.00%> (+1.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa8979c...1769be9. Read the comment docs.

@damccorm damccorm force-pushed the users/damccorm/flinkTests branch from e0d716c to 1769be9 Compare March 14, 2022 17:54
@damccorm
Copy link
Contributor Author

Run Go Flink ValidatesRunner

@damccorm damccorm changed the title WIP - try to fix [BEAM-12753] and [BEAM-12815] [BEAM-12753] and [BEAM-12815] Fix Flink Integration Tests Mar 14, 2022
Copy link
Contributor

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is a big win for helping us get testing for streaming features up and running

@damccorm
Copy link
Contributor Author

R: @youngoli for final approval

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @youngoli for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

FYI @ibzib since you could find the Flink config adjustment useful someday.

// TODO(BEAM-12753): Flink test stream fails for non-string/byte slice inputs
"TestTestStream.*Sequence.*",
// Triggers are not yet supported
"TestTrigger.*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try (in separate PRs) removing the other test filters as well, as I can see all of these being tightly related (timeouts and memory for debezium, and Triggers leaning on TestStream).

@lostluck lostluck merged commit 4976b2e into apache:master Mar 18, 2022
@damccorm damccorm deleted the users/damccorm/flinkTests branch March 30, 2022 19:09
@damccorm damccorm mentioned this pull request Nov 18, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants