-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-12735] Adding Python XLang examples to the RC validation script #15307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @chamikaramj |
Codecov Report
@@ Coverage Diff @@
## master #15307 +/- ##
==========================================
- Coverage 83.81% 83.79% -0.03%
==========================================
Files 441 441
Lines 59745 59801 +56
==========================================
+ Hits 50075 50109 +34
- Misses 9670 9692 +22
Continue to review full report at Codecov.
|
chamikaramj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
| echo "You don't have gnome-terminal installed." | ||
| if [[ "$INSTALL_GNOME_TERMINAL" != true ]]; then | ||
| sudo apt-get upgrade | ||
| if [[ "$INSTALL_GNOME_TERMINAL" = true ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition here was inverted. Did we have a bug before ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. The software needs to be installed when the variable is true.
| if [[ "$INSTALL_KUBECTL" = true ]]; then | ||
| sudo apt-get install kubectl | ||
| else | ||
| echo "kubectl is not installed. Validation on Python cross-language Kafka taxi will be skipped." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a failure for validation instead of skipping ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we already exit the program but only the printed message is misleading. Update the messages.
| CLUSTER_NAME=xlang-kafka-cluster-$RANDOM | ||
| if [[ "$python_xlang_kafka_taxi_dataflow" = true ]]; then | ||
| gcloud container clusters create --project=${USER_GCP_PROJECT} --region=${USER_GCP_REGION} --no-enable-ip-alias $CLUSTER_NAME | ||
| kubectl apply -R -f ${LOCAL_BEAM_DIR}/.test-infra/kubernetes/kafka-cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you confirm that works. I think we Beam Kafka IT is currently failing due to a port issue when starting up this cluster: https://issues.apache.org/jira/browse/BEAM-9482
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked with clouddfe project. I think there was no other program on clouddfe project using the same port assigned to k8s Kafka cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get it to a state where the release manager can consistently run the test using the default (apache-beam-testing) project. (I'm not sure if you'll actually hit https://issues.apache.org/jira/browse/BEAM-9482 or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's a separate work. We need to update k8s configs for dynamically assigning the ports.
| --runner DataflowRunner \ | ||
| --num_workers 5 \ | ||
| --temp_location=${USER_GCS_BUCKET}/temp/ \ | ||
| --experiments=use_runner_v2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need to manually specify this experiment for Beam 2.32.0 and later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| --runner DataflowRunner \ | ||
| --num_workers 5 \ | ||
| --temp_location=${USER_GCS_BUCKET}/temp/ \ | ||
| --experiments=use_runner_v2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto regarding experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| echo "* How to verify results:" | ||
| echo "* 1. Goto your Dataflow job console and check whether there is any error." | ||
| echo "* 2. Check whether your ${SQL_TAXI_SUBSCRIPTION} subscription has data below:" | ||
| # run twice since the first execution would return 0 messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. I found that sometimes gcloud pubsub pull command just returns empty result (mostly when the first pull command after the subscription creation). Supposedly, this on-screen outputs only provide the hint to the release manager that any data exists in the sink. Visiting the web console might be needed if the hint doesn't help.
| sleep 10m | ||
| echo "* How to verify results:" | ||
| echo "* 1. Goto your Dataflow job console and check whether there is any error." | ||
| echo "* 2. Check whether ${KAFKA_TAXI_DF_DATASET}.xlang_kafka_taxi has data, retrieving BigQuery data as below: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be possible to run a 'grep' to confirm that the output is not empty ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be reliably possible since the data is constantly changing. Manual review is still important not only for this tests but also other existing validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data changes but can we just verify that the output is not empty ? Based on my observation there's always some output data for these pipelines after few minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Thanks. LGTM. |
|
Retest this please |
|
Run Python_PVR_Flink PreCommit |
|
Run Java PreCommit |
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunnercompliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.